Quantum algorithm for the nonlinear dimensionality reduction with arbitrary kernel

YaoChong Li; Ri-Gui Zhou; RuiQing Xu; WenWen Hu; Ping Fan

doi:10.1088/2058-9565/abbe66

1. Introduction

Data mining and pattern recognition [1] achieve unprecedented success in many fields as a result of the advent of the era of big data and the development of hardware. Meanwhile, as an essential part of the machine learning, dimensionality reduction (DR) [2] is usually employed to extract low dimensional, useful information from high dimensional, redundant data without a significant loss. However, classical DR approaches are always accompanied by the high time complexity due to involve the large-scale matrix computation and the eigenvalue solving problem.

On the other hand, quantum science and technology has been applied in various sub-domains of information science in the past decades, such as image processing [3] and network communication [4]. It has been demonstrated that quantum computation may provide a potential acceleration in some specific problems owing to its unique property of supposition and entanglement [5]. Particularly, many recent contributions [6–21] are dedicated to developing the quantum machine learning (QML) algorithms by integrating classical machine learning with quantum computation for obtaining a potential quantum speedup. There are two comprehensive reviews of QML, please refer to [22, 23]. However, although the complexity is significantly lower than the classic counterpart, the existing QML researches are always restricted to the linear case since the operators in the quantum computation are always linear and unitary [24, 25]. It is generally known that the straightforward linear machine learning models only suited for linearly separable problems, which is insufficient for many real-world inseparable ones. Hence, introducing the general nonlinearity in the linear QML models is deemed as a necessary and nontrivial task.

One of the most common ways to introduce nonlinearity in classical machine learning is the kernel method, its effectiveness on the classification, clustering, regression, and DR tasks have been demonstrated in lots of literature. Motivated by this, for introducing the general nonlinearity in the QML models, the approach to construct the Hamiltonian simulation with an arbitrary nonlinear kernel is proposed in this paper. Based on this, the quantum kernel principal component analysis (QKPCA) algorithm is investigated further, which can be implemented on a universal quantum computer for performing nonlinear DR at the same time the exponential acceleration is also preserved. Moreover, the quantum versions of other common nonlinear DR methods are also discussed by utilizing the basic framework of QKPCA. Finally, since the generality of the kernel method, this work can be easily transferred into the other linear QML models, not confined to the quantum DR techniques.

The remainder of this paper is organized as follows. Section 2 introduces the preliminary concepts of related technology necessary to understand this work. The proposed scheme is detailed in section 3, including the implementation of QKPCA and its extension to other nonlinear DR methods. Section 4 presents a brief discussion about the limitation of existing related techniques and potential applications on other linear QML models. Finally, the conclusion and future work are discussed in section 5.

2. Preliminaries

2.1. Quantum computation background

The basic concept of quantum computation is described shortly in this subsection and the detailed instruction can be found in [26].

In analogy to the bit in the classical information, the elementary unit of quantum information is defined as the qubit, which is denoted by the Dirac notation: |0⟩ = [1 0]^T, |1⟩ = [0 1]^T. Different from the classical bit, a qubit can be existed in both |0⟩ and |1⟩ simultaneously, called a superposition state: |φ⟩ = α|0⟩ + β|1⟩ = [α β]^T. Where $\alpha ,\beta \in \mathbb{C}$ and |α|² + |β|² = 1. When performing the quantum measurement, it will collapse to the state |0⟩ with probability |α|², or to the state |0⟩ with probability |β|². Another unique property of quantum state is the entanglement, an intuitive example is the Bell state $\left(\vert 00\rangle +\vert 11\rangle \right)/\sqrt{2}$ , which cannot be represented as a tensor product of two independent qubits.

In the conventional digital circuit, the manipulations on bits are performed by the classical logic gates. The equivalent building blocks in the quantum circuit are called quantum gates. It should be mentioned that a quantum gate U must be a unitary operation (UU^† = I) and implemented reversibly, which is different from the classical context. More precisely, the quantum circuit composed by a group of quantum gates is employed to transform the quantum state into another one.

2.2. Kernel principal component analysis

To facilitate understanding, the outline of the classical kernel principal component analysis (KPCA) algorithm is presented herein.

As one of the most celebrated DR instances, principal component analysis (PCA) [27] projects the original dataset onto a low-dimensional space with preserving the data variance in maximum. To tackle the nonlinear problem in machine learning, the kernel generalization of the PCA was proposed in [28], namely KPCA. The main idea of KPCA is computing the principal components in high-dimensional feature spaces by means of integral operators and nonlinear kernel functions. Some notations are declared firstly: the original dataset is described as a data matrix $\mathbf{X}\in {\mathbb{R}}^{N{\times}D}$ , where the ith row of X is denoted as the ith observation ${\boldsymbol{x}}_{i}=\left[{x}_{i}^{1}\enspace {x}_{i}^{2}\enspace \dots \enspace {x}_{i}^{D}\right]$ . The purpose is to obtain the projected dataset $\mathbf{Z}\in {\mathbb{R}}^{N{\times}d}$ for D > d. The N × N kernel matrix is firstly defined as:

$\begin{equation}\mathbf{K}=\left[\begin{matrix}\hfill \kappa \left({\boldsymbol{x}}_{1},{\boldsymbol{x}}_{1}\right)\hfill & \hfill \kappa \left({\boldsymbol{x}}_{1},{\boldsymbol{x}}_{2}\right)\hfill & \hfill \dots \hfill & \hfill \kappa \left({\boldsymbol{x}}_{1},{\boldsymbol{x}}_{N}\right)\hfill \\ \hfill \kappa \left({\boldsymbol{x}}_{2},{\boldsymbol{x}}_{1}\right)\hfill & \hfill \kappa \left({\boldsymbol{x}}_{2},{\boldsymbol{x}}_{2}\right)\hfill & \hfill \dots \hfill & \hfill \kappa \left({\boldsymbol{x}}_{2},{\boldsymbol{x}}_{N}\right)\hfill \\ \hfill {\vdots}\hfill & \hfill {\vdots}\hfill & \hfill {\vdots}\hfill & \hfill {\vdots}\hfill \\ \hfill \kappa \left({\boldsymbol{x}}_{N},{\boldsymbol{x}}_{1}\right)\hfill & \hfill \kappa \left({\boldsymbol{x}}_{N},{\boldsymbol{x}}_{2}\right)\hfill & \hfill \dots \hfill & \hfill \kappa \left({\boldsymbol{x}}_{N},{\boldsymbol{x}}_{N}\right)\hfill \end{matrix}\right]\end{equation} \tag{ 1 }$

in which, the i-row, j-column item κ( x _i, x _j) is denoted as the kernel function κ on sample x _i and x _i. Some popular kernel functions are listed on table 1.

Table 1. Some common kernels used in machine learning.

Name	Kernel function	Hyper-parameters
Linear	$\kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)={\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}$	None
Polynomial	$\kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)={\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}^{r}$	$r\in {\mathbb{N}}^{+}$
Gaussian	κ( x _i, x _j) = exp(−λ ⋅ \|\| x _i − x _j\|\|²)	$\lambda \in {\mathbb{R}}^{+}$
Exponential	κ( x _i, x _j) = exp(−λ ⋅ \|\| x _i − x _j\|\|)	$\lambda \in {\mathbb{R}}^{+}$
Sigmoid	$\kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)=\mathrm{tanh}\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}+c\right)$	$c\in \mathbb{R}$

The core of KPCA is solving the system of linear equations: Kv = λ v . Where the eigenvector v ₁, v ₂, ..., v _N is associated with the eigenvalue λ₁ ⩾ λ₂ ⩾ ⋯ ⩾ λ_N. Now, extracting the first d principal components v ₁, v ₂, ..., v _d correspond to the eigenvalue λ₁, λ₂, ..., λ_d. The projected sample ${\boldsymbol{z}}_{i}\in {\mathbb{R}}^{d}$ for i = 1, 2, ..., N can be obtained by:

$\begin{equation}{\boldsymbol{z}}_{i}={\left[\begin{matrix}\hfill \sqrt{{\lambda }_{1}}{v}_{1}^{i}\sqrt{{\lambda }_{2}}{v}_{2}^{i}\dots \sqrt{{\lambda }_{d}}{v}_{d}^{i}\hfill \end{matrix}\right]}^{\mathrm{T}}\end{equation} \tag{ 2 }$

${v}_{j}^{i}$ represents the ith element in the v _j for j = 1, 2, ..., d. Specifically, the ${z}_{i}^{j}$ can be computed as follows:

$\begin{equation}{z}_{i}^{j}=\frac{1}{\sqrt{{\lambda }_{j}}}\left[\begin{matrix}\hfill \kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{1}\right)\kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{2}\right)\dots \kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{N}\right)\hfill \end{matrix}\right]{\boldsymbol{v}}_{j}\end{equation} \tag{ 3 }$

where ${\boldsymbol{v}}_{j}={\left[{v}_{j}^{1}\enspace {v}_{j}^{2}\enspace \dots \enspace {v}_{j}^{N}\right]}^{\mathrm{T}}$ . Hence, the projected matrix Z can be denoted as:

$\begin{equation}\mathbf{Z}=\left[\begin{matrix}\hfill \sqrt{{\lambda }_{1}}{v}_{1}^{1}\hfill & \hfill \sqrt{{\lambda }_{2}}{v}_{2}^{1}\hfill & \hfill \dots \hfill & \hfill \sqrt{{\lambda }_{d}}{v}_{d}^{1}\hfill \\ \hfill \sqrt{{\lambda }_{1}}{v}_{1}^{2}\hfill & \hfill \sqrt{{\lambda }_{2}}{v}_{2}^{2}\hfill & \hfill \dots \hfill & \hfill \sqrt{{\lambda }_{d}}{v}_{d}^{2}\hfill \\ \hfill {\vdots}\hfill & \hfill {\vdots}\hfill & \hfill {\vdots}\hfill & \hfill {\vdots}\hfill \\ \hfill \sqrt{{\lambda }_{1}}{v}_{1}^{N}\hfill & \hfill \sqrt{{\lambda }_{2}}{v}_{2}^{N}\hfill & \hfill \dots \hfill & \hfill \sqrt{{\lambda }_{d}}{v}_{d}^{N}\hfill \end{matrix}\right].\end{equation} \tag{ 4 }$

3. Quantum algorithm for nonlinear DR

3.1. Quantum kernel principal component analysis

The first quantum version of PCA algorithm (QPCA) is proposed in [7], which is employed to the quantum state tomography rather than the DR tasks. Further discussions about QPCA algorithm for DR are presented in [13, 15, 16]. Taking the work in [16] as an example, although the exponential speedup is achieved, it holds for linear cases only. To solve this problem and maintain the quantum acceleration, the QKPCA algorithm is investigated in this section. Before constructing the specific procedure, several lemmas are summarized from existing works:

Lemma 1. Let A be an l × m matrix and B an m × n matrix such that ln ≠ 1, then the quantum state of AB can be obtained parallel swap test in time $\tilde {\mathcal{O}}\left({\Vert}\mathbf{A}{{\Vert}}_{F}{\Vert}\mathbf{B}{{\Vert}}_{F}/{\Vert}\mathbf{A}\mathbf{B}{{\Vert}}_{F}{\epsilon}\right)=\tilde {\mathcal{O}}\left(\sqrt{n}k/{\epsilon}\right)$ with precision , where k is the condition number of A [29].

Lemma 2. For any smooth function $f:\mathbb{C}\to \left[-1,1\right]$ , the transformation

$\begin{equation}\sum _{i=1}^{N}{\alpha }_{i}\vert i\rangle {\mapsto}\frac{1}{\sqrt{\sum {\tilde {f}}^{2}\left({\alpha }_{i}\right)}}\sum _{i=1}^{N}\tilde {f}\left({\alpha }_{i}\right)\vert i\rangle \end{equation} \tag{ 5 }$

can be implemented by a probabilistic quantum algorithm in time $\mathcal{O}\left({{\epsilon}}^{-1}\text{poly}\left(\mathrm{log}\left(N\right)\right)\right)$ . Where $\vert \tilde {f}\left({\alpha }_{j}\right)-f\left({\alpha }_{j}\right)\vert {\leqslant}{\epsilon}$ and the probability of success is given as ${\sum }_{i=1}^{N}\tilde {f}{\left({\alpha }_{i}\right)}^{2}/N$ [30].

Lemma 3. If a Hamiltonian H can be decomposed into a linear combination of efficiently realizable unitary matrices as $\mathbf{H}={\sum }_{l=1}^{L}{\alpha }_{l}{\mathbf{H}}_{l}$ . To simulate U = e^{−iH
t}, the evolution t is divided into n segments with U_n = e^{−iH
t/n}. It can be approximated as ${\tilde {\mathbf{U}}}_{n}={\sum }_{k=0}^{K}1/k!{\left(-i\mathbf{H}t/n\right)}^{k}$ with error ${\sum }_{k=K+1}^{\infty }{\left(\mathrm{ln}\enspace 2\right)}^{k}/k!$ . If the truncated series K is chosen as $\mathcal{O}\left(\mathrm{log}\left(n/{\epsilon}\right)/\mathrm{log}\enspace \mathrm{log}\left(n/{\epsilon}\right)\right)$ , then ${\Vert}{\mathbf{U}}_{n}-{\tilde {\mathbf{U}}}_{n}{\Vert}{\leqslant}{\epsilon}/n$ and the total error is within [31].

Lemma 4. If a quantum state can be prepared in time $\mathcal{O}\left({T}_{\text{in}}\right)$ , its density matrix can be simulated in time $\mathcal{O}\left({t}^{2}{{\epsilon}}^{-1}{T}_{\text{in}}\right)$ to accuracy , the specific procedure is shown in [6, 7].

Based on the above lemmas, two important theorems are derived, which are used as the basic building blocks of the QKPCA algorithm.

Theorem 1. $\vert \mathbf{K}\rangle =\frac{1}{{\Vert}\mathbf{K}{{\Vert}}_{F}}{\sum }_{i=1}^{N}{\sum }_{j=1}^{N}\kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)\vert i\rangle \vert j\rangle$ is defined as the kernel matrix quantum state, which can be prepared in time $\mathcal{O}\left({{\epsilon}}^{-1}\text{poly}\left(\mathrm{log}\left(ND\right)\right)\right)$ , where ||K||_F is the Frobenius norm of K.

Proof. The preparation of quantum state |K⟩ can be implemented by two stages: performing the quantum matrix multiplication and applying the nonlinear transformation κ on the quantum amplitudes. Firstly, the quantum state corresponding to the data matrix X and its transposition X_T are defined as follows:

$\begin{equation}\begin{matrix}\hfill \vert \mathbf{X}\rangle =\frac{1}{{\Vert}\mathbf{X}{{\Vert}}_{F}}\sum _{i=1}^{N}{\Vert}{\mathbf{X}}_{i{\bullet}}{{\Vert}}_{2}\vert i\rangle \vert {\mathbf{X}}_{i{\bullet}}\rangle \hfill \\ \hfill \vert {\mathbf{X}}^{\mathrm{T}}\rangle =\frac{1}{{\Vert}{\mathbf{X}}^{\mathrm{T}}{{\Vert}}_{F}}\sum _{j=1}^{N}{\Vert}{\mathbf{X}}_{{\bullet}j}^{\mathrm{T}}{{\Vert}}_{2}\vert {\mathbf{X}}_{{\bullet}j}^{\mathrm{T}}\rangle \vert j\rangle .\hfill \end{matrix}\end{equation} \tag{ 6 }$

It can be prepared in time $\mathcal{O}\left(\mathrm{p}\mathrm{o}\mathrm{l}\mathrm{y}\left(\mathrm{log}\left(ND\right)\right)\right)$ according to the quantum random access memory (QRAM) [32]. In which,

$\begin{equation}\begin{matrix}\hfill \vert {\mathbf{X}}_{i{\bullet}}\rangle =\frac{1}{{\Vert}{\mathbf{X}}_{i{\bullet}}{{\Vert}}_{2}}\sum _{j=1}^{D}{x}_{i}^{j}\vert j\rangle \hfill \\ \hfill \vert {\mathbf{X}}_{{\bullet}j}^{\mathrm{T}}\rangle =\frac{1}{{\Vert}{\mathbf{X}}_{{\bullet}j}^{\mathrm{T}}{{\Vert}}_{2}}\sum _{i=1}^{D}{x}_{j}^{i}\vert i\rangle .\hfill \end{matrix}\end{equation} \tag{ 7 }$

Then, employing quantum matrix multiplication given in the lemma 1 on quantum state |X⟩ and |X^T⟩, the output quantum state is described as:

$\begin{equation}\frac{1}{{\Vert}\mathbf{X}{{\Vert}}_{F}{\Vert}{\mathbf{X}}^{\mathrm{T}}{{\Vert}}_{F}}\sum _{i,j=1}^{N}{\Vert}{\mathbf{X}}_{i{\bullet}}{{\Vert}}_{2}{\Vert}{\mathbf{X}}_{{\bullet}j}^{\mathrm{T}}{{\Vert}}_{2}\langle {\mathbf{X}}_{i{\bullet}}\vert {\mathbf{X}}_{{\bullet}j}^{\mathrm{T}}\rangle \vert i,j\rangle .\end{equation} \tag{ 8 }$

Since a kernel function can be interpreted as a smooth function apply on the inner product form between samples, i.e. $\kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)=f\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)$ , the Taylor series expansion of function f with the basic element ${\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}$ provides an approximation to various nonlinear kernels, seen the appendix A. Based on lemma 2, the (8) can be transformed into |K⟩ in time $\mathcal{O}\left({{\epsilon}}^{-1}\text{poly}\left(\mathrm{log}\left(N\right)\right)\right)$ to accuracy . Considering the general case of N ≫ D, the complexity for preparing the quantum state |K⟩ is given as $\mathcal{O}\left({{\epsilon}}^{-1}\text{poly}\left(\mathrm{log}\left(ND\right)\right)\right)$ . □

Theorem 2. For the arbitrary kernel matrix K, there exists a quantum algorithm to implement its Hamiltonian simulation e^{−iK
t} within a fixed error .

Proof. The Taylor expansion of the kernel function κ( x _i, x _j) at the zero is written as:

$\begin{equation}f\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)=f\left(0\right)+{f}^{\prime }\left(0\right){\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}+\cdots +\frac{{f}^{\left(n-1\right)}\left(0\right)}{n-1!}{\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}^{n-1}+{R}_{n}\end{equation} \tag{ 9 }$

in which, R_n is the remainder after n terms. Hence, for the arbitrary kernel matrix K_i,j = [κ( x _i, x _j)], it can be approximated by a linear combination of Hermitian matrices by truncating at order R:

$\begin{equation}\mathbf{K}\approx \sum _{r=0}^{R}{\alpha }_{r}{\mathbf{K}}_{r}\end{equation} \tag{ 10 }$

where α_r = f^(r)(0)/r! is the corresponding coefficient of ${\mathbf{K}}_{r}=\left[{\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}^{r}\right]$ for r = 0, 1, ..., R.

Obviously, the K₀ is an all-ones matrix, which can be decomposed into the sum of N unitary matrices:

$\begin{equation}{\mathbf{K}}_{0}=\left[\begin{matrix}\hfill 1\hfill & \hfill 0\hfill & \hfill \dots \hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill \dots \hfill & \hfill 0\hfill \\ \hfill {\vdots}\hfill & \hfill {\vdots}\hfill & \hfill \ddots \hfill & \hfill {\vdots}\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill \dots \hfill & \hfill 1\hfill \end{matrix}\right]+\left[\begin{matrix}\hfill 0\hfill & \hfill 1\hfill & \hfill \dots \hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill \ddots \hfill & \hfill {\vdots}\hfill \\ \hfill {\vdots}\hfill & \hfill {\vdots}\hfill & \hfill \dots \hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill \dots \hfill & \hfill 0\hfill \end{matrix}\right]+\cdots +\left[\begin{matrix}\hfill 0\hfill & \hfill 0\hfill & \hfill \dots \hfill & \hfill 1\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill \dots \hfill & \hfill 0\hfill \\ \hfill {\vdots}\hfill & \hfill \ddots \hfill & \hfill {\vdots}\hfill & \hfill {\vdots}\hfill \\ \hfill 0\hfill & \hfill \dots \hfill & \hfill 1\hfill & \hfill 0\hfill \end{matrix}\right]=\sum _{j=0}^{N-1}{\mathbf{V}}_{j}.\end{equation} \tag{ 11 }$

The unitary matrix V_j is given as ${\sum }_{k=0}^{N-1}\vert \left(k-j\right)\enspace \mathrm{mod}\enspace N\rangle \langle k\vert$ . Since V_j are with a uniform probability distribution, according to lemma 3, the ${\mathrm{e}}^{-\mathrm{i}{\mathbf{K}}_{0}t}$ can be implemented using the $\mathcal{O}\left(t\left(\mathrm{log}\left(t/{\epsilon}\right)/\mathrm{log}\enspace \mathrm{log}\left(t/{\epsilon}\right)\right)\right)$ calls of controlled H^{⊗ log N} gates⁴ and $\mathcal{O}\left(t\left(\mathrm{log}\enspace N\right)\left(\mathrm{log}\left(t/{\epsilon}\right)/\mathrm{log}\enspace \mathrm{log}\left(t/{\epsilon}\right)\right)\right)$ additional one- and two-qubit gates. Therefore, the all-ones matrix K₀ can be simulated in $\mathcal{O}\left(t\left(\mathrm{log}\enspace N\right)\left(\mathrm{log}\left(t/{\epsilon}\right)/\mathrm{log}\enspace \mathrm{log}\left(t/{\epsilon}\right)\right)\right)$ to accuracy ⁵ .

For the arbitrary degree polynomial kernel K_r for r = 1, 2, ..., R, the corresponding Hamiltonian simulations ${\text{e}}^{-\text{i}{\mathbf{K}}_{r}t}$ can be implemented based on lemma 4. Without loss of generality, the quantum state |X^r⟩ is firstly prepared as:

$\begin{equation}\vert {\mathbf{X}}^{r}\rangle =\frac{1}{\sqrt{{\sum }_{i=1}^{N}{\Vert}{\mathbf{X}}_{i{\bullet}}^{r}{{\Vert}}_{2}^{2}}}\sum _{i=1}^{N}{\Vert}{\mathbf{X}}_{i{\bullet}}^{r}{{\Vert}}_{2}\vert i\rangle \vert {\mathbf{X}}_{i{\bullet}}^{r}\rangle \end{equation} \tag{ 12 }$

where $\vert {\mathbf{X}}_{i{\bullet}}^{r}\rangle$ is the abbreviation of the r-times tensor product $\underset{r}{\underbrace{\vert {\mathbf{X}}_{i{\bullet}}\rangle \otimes \dots \otimes \vert {\mathbf{X}}_{i{\bullet}}\rangle }}$ and ${\Vert}{\mathbf{X}}_{i{\bullet}}^{r}{{\Vert}}_{2}$ denotes the corresponding 2-norm. Next, performing the partial trace operation on the second register of |X^r⟩:

$\begin{equation}{\mathrm{tr}}_{2}\left\{\vert {\mathbf{X}}^{r}\rangle \langle {\mathbf{X}}^{r}\vert \right\}=\frac{1}{{\sum }_{i=1}^{N}{\Vert}{\mathbf{X}}_{i{\bullet}}^{r}{{\Vert}}_{2}^{2}}\sum _{i,j=1}^{N}\langle {\mathbf{X}}_{i{\bullet}}^{r}\vert {\mathbf{X}}_{j{\bullet}}^{r}\rangle {\Vert}{\mathbf{X}}_{i{\bullet}}^{r}{{\Vert}}_{2}{\Vert}{\mathbf{X}}_{j{\bullet}}^{r}{{\Vert}}_{2}\vert i\rangle \langle j\vert =\frac{{\mathbf{K}}_{r}}{\mathrm{tr}\left\{{\mathbf{K}}_{r}\right\}}={\hat{\mathbf{K}}}_{r}\end{equation} \tag{ 13 }$

where ${\hat{\mathbf{K}}}_{r}$ is the normalized kernel matrix of K_r. Following the lemma 4, the Hamiltonian simulation ${\mathrm{e}}^{-\mathrm{i}{\mathbf{K}}_{r}t}$ can be implemented to accuracy in time $\mathcal{O}\left({t}^{2}{{\epsilon}}^{-1}\text{poly}\left(\mathrm{log}\enspace ND\right)\right)$ .

Since each individual Hamiltonian K_r can be simulated efficiently, it is intuitive to simulate the Hamiltonian K based on the Lie–Trotter product formula:

$\begin{equation}\enspace \underset{n\to \infty }{\mathrm{lim}\enspace }{\left({\mathrm{e}}^{-\mathrm{i}{\mathbf{K}}_{0}t/n}\enspace {\mathrm{e}}^{-\mathrm{i}{\mathbf{K}}_{1}t/n}\dots {\mathrm{e}}^{-\mathrm{i}{\mathbf{K}}_{R}t/n}\right)}^{n}={\mathrm{e}}^{-\mathrm{i}\mathbf{K}t}.\end{equation} \tag{ 14 }$

In particular, the Hamiltonian simulation with the form of (10) has been well-studied in the past years, which is categorized as the divide and conquer approach. Hence, with the efficient simulation of each part in K, the e^{−iK
t} can be implemented with splitting formulas [34–36]. □

Based on the above two theorems, the QKPCA is formulated step by step and its overall circuit is shown in figure 1(a).

**Figure 1.** (a) Overall circuit of the QKPCA algorithm, which is divided into five parts and each part is marked by the corresponding number. The first part is the QPE with the unitary operator e^{iK
t}. The second part is employed to compute the rotation angle. The third part is implementing the controlled rotation, which is to assign the quantum state to the amplitude. The fourth part is used to uncompute the intermediate registers, and the last part is the measurement operation. (b) The expanded circuit for QPE U_PE. (c) Circuit for computing the rotation angle and implementing controlled rotation, where the red dashed box is symbolized as ${U}_{\mathcal{G}}$ and green dashed box is the expanded circuit of controlled rotation operation ${\mathbf{C}\mathbf{R}}_{\mathcal{Y}}\left(2{\omega }_{j}\pi \right)$ .
Download figure:
Standard image High-resolution image

At the very beginning, the ith row of matrix K is denoted as ${\mathbf{K}}_{i{\bullet}}=\left[\kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{1}\right)\enspace \kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{2}\right)\enspace \dots \enspace \kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{N}\right)\right]$ . Hence, the quantum state |K⟩ can be rewritten as:

$\begin{equation}\vert \mathbf{K}\rangle =\frac{1}{{\Vert}\mathbf{K}{{\Vert}}_{F}}\sum _{i=1}^{N}\sum _{j=1}^{N}\kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)\vert i\rangle \vert j\rangle =\frac{1}{{\Vert}\mathbf{K}{{\Vert}}_{F}}\sum _{i=1}^{N}{\Vert}{\mathbf{K}}_{i}{{\Vert}}_{2}\vert i\rangle \vert {\mathbf{K}}_{i{\bullet}}\rangle \end{equation} \tag{ 15 }$

where ||K_i||₂ denotes the 2-norm of vector K_i• and quantum state $\vert {\mathbf{K}}_{i{\bullet}}\rangle ={\sum }_{j=1}^{N}\frac{\kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)}{{\Vert}{\mathbf{K}}_{i{\bullet}}{{\Vert}}_{2}}\vert j\rangle$ . It should be mentioned that the transposition of K_i• can be represented as follows:

$\begin{equation}\left[\begin{matrix}\hfill \kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{1}\right)\hfill \\ \hfill \kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{2}\right)\hfill \\ \hfill {\vdots}\hfill \\ \hfill \kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{N}\right)\hfill \end{matrix}\right]={\lambda }_{1}{v}_{1}^{i}\vert {\boldsymbol{v}}_{1}\rangle +{\lambda }_{2}{v}_{2}^{i}\vert {\boldsymbol{v}}_{2}\rangle +\cdots +{\lambda }_{N}{v}_{N}^{i}\vert {\boldsymbol{v}}_{N}\rangle .\end{equation} \tag{ 16 }$

Therefore, the quantum state |K_i•⟩ can be spanned by the eigenvectors $\left\{\vert {\boldsymbol{v}}_{1}\rangle ,\vert {\boldsymbol{v}}_{2}\rangle ,\dots ,\vert {\boldsymbol{v}}_{N}\rangle \right\}$ :

$\begin{equation}\vert {\mathbf{K}}_{i{\bullet}}\rangle =\frac{1}{{\Vert}{\mathbf{K}}_{i{\bullet}}{{\Vert}}_{2}}\sum _{j=1}^{N}\kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)\vert j\rangle =\frac{1}{{\Vert}{\mathbf{K}}_{i{\bullet}}{{\Vert}}_{2}}\sum _{j=1}^{N}{\lambda }_{j}{v}_{j}^{i}\vert {\boldsymbol{v}}_{j}\rangle .\end{equation} \tag{ 17 }$

Taking |K⟩ as the input quantum state and performing the quantum phase estimation (QPE) on the register |K_i•⟩:

$\begin{equation}\vert \mathbf{K}\rangle \vert {0\rangle }^{\otimes l}{\mathbf{U}}_{\mathbf{PE}}{{\longmapsto}}\frac{1}{{\Vert}\mathbf{K}{{\Vert}}_{F}}\sum _{i=1}^{N}\sum _{j=1}^{N}\vert i\rangle {\lambda }_{j}{v}_{j}^{i}\vert {\boldsymbol{v}}_{j}\rangle \vert {\lambda }_{j}\rangle .\end{equation} \tag{ 18 }$

In above transformation, the unitary operation U_PE can be implemented by the circuit in figure 1(b) and defined as follows:

$\begin{equation}{\mathbf{U}}_{\mathbf{PE}}=\left({\mathbf{Q}\mathbf{F}\mathbf{T}}^{{\dagger}}\otimes {\mathbf{I}}_{N{\times}N}\right)\left(\sum _{\tau =0}^{L-1}\vert \tau \rangle \langle \tau \vert \otimes {\text{e}}^{\text{i}\mathbf{K}t\frac{\tau }{L}}\right)\left({\mathbf{H}}^{\otimes L}\otimes {\mathbf{I}}_{N{\times}N}\right)\end{equation} \tag{ 19 }$

where QFT^† is the quantum Fourier inverse transform, I_N×N is an N × N identity matrix, and L = 2^l.

Then, employing the quantum arithmetic operation ${U}_{\mathcal{G}}$ to compute the rotation angle ω_j. As shown in the red dashed box of figure 1(c), the operation ${U}_{\mathcal{G}}$ consists of a square root module and an arccot module, their detailed implementable circuit has been proposed in [37, 38] respectively. After that, adding an auxiliary quantum bit initialized as |0⟩ and applying the controlled rotation operation ${\mathbf{C}\mathbf{R}}_{\mathcal{Y}}\left(2{\omega }_{j}\pi \right)$ to obtain the following quantum state (the expanded circuit of ${\mathbf{C}\mathbf{R}}_{\mathcal{Y}}\left(2{\omega }_{j}\pi \right)$ is shown in the green dashed box of figure 1(c)):

$\begin{equation}\frac{1}{{\Vert}\mathbf{K}{{\Vert}}_{F}}\sum _{i=1}^{N}\sum _{j=1}^{N}\vert i\rangle {\lambda }_{j}{v}_{j}^{i}\vert {\boldsymbol{v}}_{j}\rangle \vert {\lambda }_{j}\rangle \vert {\omega }_{j}\rangle \left(\sqrt{1-\frac{1}{{\lambda }_{j}}}\vert 0\rangle +\frac{1}{\sqrt{{\lambda }_{j}}}\vert 1\rangle \right).\end{equation} \tag{ 20 }$

Next, applying ${U}_{\mathcal{G}}^{{\dagger}}$ and inverse QPE ${\mathbf{U}}_{\mathbf{PE}}^{{\dagger}}$ to uncompute the rotation angle register and eigenvalue register respectively, that is:

$\begin{equation}\frac{1}{{\Vert}\mathbf{K}{{\Vert}}_{F}}\sum _{i=1}^{N}\sum _{j=1}^{N}\vert i\rangle {\lambda }_{j}{v}_{j}^{i}\vert {\boldsymbol{v}}_{j}\rangle \left(\sqrt{1-\frac{1}{{\lambda }_{j}}}\vert 0\rangle +\frac{1}{\sqrt{{\lambda }_{j}}}\vert 1\rangle \right).\end{equation} \tag{ 21 }$

Now, performing the quantum measurement on the ancillary qubit in computational basis. With the success probability ${\sum }_{j=1}^{N}{\lambda }_{j}^{-1}$ , the following state is acquired:

$\begin{equation}\frac{1}{\mathcal{C}}\sum _{i=1}^{N}\sum _{j=1}^{N}\vert i\rangle \sqrt{{\lambda }_{j}}{v}_{j}^{i}\vert {\boldsymbol{v}}_{j}\rangle \end{equation} \tag{ 22 }$

where $\mathcal{C}=\sqrt{{\sum }_{i,j=1}^{N}{\left(\sqrt{{\lambda }_{j}}{v}_{j}^{i}\right)}^{2}}$ is the normalization coefficient. Finally, recalling the sample mapping | v _j⟩ ↦ |j⟩ constructed in [16], the same operation can be applied in the above equation. Hence, the final state is written as:

$\begin{equation}\frac{1}{\tilde {\mathcal{C}}}\sum _{i=1}^{N}\sum _{j=1}^{d}\vert i\rangle \sqrt{{\lambda }_{j}}{v}_{j}^{i}\vert j\rangle \propto \vert \mathbf{Z}\rangle .\end{equation} \tag{ 23 }$

In which, $\tilde {\mathcal{C}}=\sqrt{{\sum }_{i=1}^{N}{\sum }_{j=1}^{d}{\left(\sqrt{{\lambda }_{j}}{v}_{j}^{i}\right)}^{2}}$ . Obviously, the output state is proportional to the target quantum state |Z⟩.

3.2. Projection on the new data

According to the QKPCA algorithm, the dataset X can be mapped into the projected Z efficiently on a universal quantum computer. Now, considering the new input sample or test sample $\boldsymbol{x}={\left[{x}^{1}\enspace {x}^{2}\enspace \dots \enspace {x}^{D}\right]}^{\mathrm{T}}$ and the aim is projecting it into the target sample $\boldsymbol{z}={\left[{z}^{1}\enspace {z}^{2}\enspace \dots \enspace {z}^{d}\right]}^{\mathrm{T}}$ . A modification should be performed on the QKPCA. At first, the vector κ_x is defined as ${\left[\kappa \left(\boldsymbol{x},{\boldsymbol{x}}_{1}\right)\enspace \kappa \left(\boldsymbol{x},{\boldsymbol{x}}_{2}\right)\enspace \dots \enspace \kappa \left(\boldsymbol{x},{\boldsymbol{x}}_{N}\right)\right]}^{\mathrm{T}}$ and the corresponding quantum state is prepared as $\vert {\kappa }_{\boldsymbol{x}}\rangle =\frac{1}{{\Vert}{\kappa }_{\boldsymbol{x}}{{\Vert}}_{2}}{\sum }_{i=1}^{N}\kappa \left(\boldsymbol{x},{\boldsymbol{x}}_{i}\right)\vert i\rangle$ . Taking the |κ_x⟩ as the input quantum state and performing the QPE on it:

$\begin{equation}\frac{1}{\sqrt{{\sum }_{i=1}^{N}{\beta }_{i}^{2}}}\sum _{i=1}^{N}{\beta }_{i}\vert {\boldsymbol{v}}_{i}\rangle \vert {\lambda }_{i}\rangle \end{equation} \tag{ 24 }$

in which, β_i = ⟨ v _i|κ_x⟩. The following steps is similar to the procedure as mentioned previously. The yielding quantum state is proportional to the target quantum state | z ⟩.

3.3. QKPCA with centering

It should be noticed that the above QKPCA algorithm is executed under the assumption that the data in feature space are zero-mean. However, this assumption does not hold true in the general case, which leads to a slight difference in the construction of the kernel matrix. According to [28], the kernel matrix K should be centered as K' = CKC, where the centering matrix C is defined as:

$\begin{equation}\mathbf{C}={\mathbf{I}}_{N{\times}N}-\boldsymbol{e}{\boldsymbol{e}}^{\mathrm{T}}={\mathbf{I}}_{N{\times}N}-\frac{1}{N}{\mathbf{1}}_{N{\times}N}.\end{equation} \tag{ 25 }$

In which, the vector e = N^−1/2[1 1 ... 1]^T and 1_N×N is an all-ones matrix. To evaluate the eigenvalue of K' in the QPE procedure, the unitary operation e^{−iK
t} in section 3.1 should be replaced by e^{−iK
t}. Hence, the key is to construct the efficient Hamiltonian simulation of centering kernel matrix K'.

To the efficient Hamilton simulation e^{−iC
t}, the centering matrix C is first regarded as the sum of I_N×N and $-\frac{1}{N}{\mathbf{1}}_{N{\times}N}$ . It is apparent that both of them are the real symmetric matrices and satisfy the commutation condition $\left[{\mathbf{I}}_{N{\times}N},-\frac{1}{N}{\mathbf{1}}_{N{\times}N}\right]=0$ . Therefore, it is not hardly to conclude:

$\begin{equation}{\text{e}}^{-\text{i}\mathbf{C}t}={\text{e}}^{-\text{i}{\mathbf{I}}_{N{\times}N}t}\enspace {\text{e}}^{\text{i}\left(\frac{1}{N}{\mathbf{1}}_{N{\times}N}\right)t}.\end{equation} \tag{ 26 }$

The first item ${\text{e}}^{-\text{i}{\mathbf{I}}_{N{\times}N}t}$ is a trivial diagonal matrix and the second item ${\text{e}}^{\text{i}\left(\frac{1}{N}{\mathbf{1}}_{N{\times}N}\right)t}$ can also be implemented based on the lemma 3 (the detailed approach seen the proof of theorem 2). According to [9], it is possible to construct the density operator to error using the Hermitian chain product:

$\begin{equation}\left[{f}_{k}\left({\mathbf{A}}_{k}\right)\dots {f}_{1}\left({\mathbf{A}}_{1}\right)\right]{\left[{f}_{k}\left({\mathbf{A}}_{k}\right)\dots {f}_{1}\left({\mathbf{A}}_{1}\right)\right]}^{{\dagger}}\end{equation} \tag{ 27 }$

where A₁, ..., A_k are arbitrary N × N Hermitian positive semi-definite matrices and functions f₁, ..., f_k are with convergent Taylor series. Specifically, substituting A₁ = K, A₂ = C, f₁(X) = X^1/2 and f₂(X) = X into the above equation, the K' can be obtained as density operation ρ_K'. Finally, using the Hamiltonian simulation technique proposed in [7], the e^{−iK
t} can be simulated efficiently. The above procedure can be implemented with the circuit in figure 2. In which, the initial state is defined as $\vert {\psi }_{0}\rangle ={\sum }_{l=0}^{L-1}\enspace \mathrm{sin}\enspace \frac{\pi \left(l+1/2\right)}{T}\vert l\rangle$ and the input density operator is given as ${\rho }_{0}=\frac{1}{\sqrt{N}}{\sum }_{i=1}^{N}\vert i\rangle \langle i\vert$ .

**Figure 2.** Quantum circuit for constructing the density operator of centering kernel matrix using the Hermitian chain product.
Download figure:
Standard image High-resolution image

3.4. Extension to other nonlinear DR methods

It has been demonstrated that there is a strong connection between the KPCA and a variety of nonlinear DR methods [2]. As a result of the QKPCA algorithm provided a whole quantum framework for performing KPCA, it is intuitive to formulate the quantum version of other nonlinear DR algorithms by modifying QKPCA or employing it as a subroutine. In this section, the extensions to other nonlinear DR methods based on QKPCA are discussed.

3.4.1. Quantum kernel discriminant analysis

In addition to the PCA, the other frequently used technique in DR is Fisher's linear discriminant analysis (LDA). Unlike the PCA, as a supervised method, LDA takes both the between-class and within-class variances into consideration. In order to remove the linear constraint in LDA, the kernel generalization of LDA (KFD) is developed in [39, 40]. The quantum generalization of LDA (QLDA) is proposed by Cong et al [9], where an exponential acceleration is provided comparing with the classical counterpart. Motivated by [41, 42], the KFD can be interpreted as KPCA plus LDA. Since the output quantum state of QKPCA can be regarded as the input quantum state of QLDA, the quantization of KFD (QKFD) can be implemented with a two-step process including QKPCA and QLDA.

3.4.2. Quantum manifold learning

Besides, an alternative way to nonlinear DR is manifold learning, which is under the assumption that the high-dimensional space may be aligned to a manifold in a low-dimensional space. The most representative models in manifold learning including isometric mapping (Isomap), locally linear embedding, Laplacian eigenmaps, and so on. As presented in [43], the above popular manifold learning algorithms can be interpreted as KPCA formulations with different kernel definitions.

Let illustrating with the classical Isomap [44], it can be reformulated as solving the eigenvalue problem on ${\mathbf{K}}_{\text{Isomop}}=-\frac{1}{2}\mathbf{C}{\mathbf{S}}^{\mathcal{G}}\mathbf{C}$ , wherein the ${\mathbf{S}}^{\mathcal{G}}=\left[{\left({d}_{ij}^{\mathcal{G}}\right)}^{2}\right]$ is the geodesic distance matrix and the pairwise geodesic distance ${d}_{ij}^{\mathcal{G}}$ is estimated by performing shortest path search on the neighborhood graph $\mathcal{G}$ ⁶ . The core task is obtaining the geodesic distance matrix ${\mathbf{S}}^{\mathcal{G}}$ with quantum speedup and constructing its efficient Hamiltonian simulation. Firstly, the Euclidean distance can be computed by quantum swap test module, which costs $\mathcal{O}\left(\frac{N\left(N-1\right)}{2}\enspace {\mathrm{log}}_{2}\enspace D\right)$ on the overall dataset X. Next, as presented in the [45], the complexity of constructing the neighborhood graph is given as $\mathcal{O}\left(N\sqrt{cN}\right)$ where c is the number of neighbor. After that, the estimation of pairwise geodesic distance ${d}_{ij}^{\mathcal{G}}$ is considered as the all pairs shortest paths (APSP) problem. In [46, 47], with the Grover's algorithm, the APSP can be solved with quantum setting running in $\mathcal{O}\left({N}^{2.5-\varepsilon }\right)$ time for ɛ > 0 while $\mathcal{O}\left({N}^{3-\varepsilon }\right)$ time in the classical algorithms. Owing to the geodesic distance matrix ${\mathbf{S}}^{\mathcal{G}}$ is not Hermitian generally, we can construct the extended matrix ${\tilde {\mathbf{S}}}^{\mathcal{G}}$ as follows:

$\begin{equation}{\tilde {\mathbf{S}}}^{\mathcal{G}}=\vert 0\rangle \langle 1\vert \otimes {\mathbf{S}}^{\mathcal{G}}+\vert 1\rangle \langle 0\vert \otimes {\left({\mathbf{S}}^{\mathcal{G}}\right)}^{{\dagger}}=\left[\begin{matrix}\hfill 0\hfill & \hfill {\mathbf{S}}^{\mathcal{G}}\hfill \\ \hfill {\left({\mathbf{S}}^{\mathcal{G}}\right)}^{{\dagger}}\hfill & \hfill 0\hfill \end{matrix}\right].\end{equation} \tag{ 28 }$

With the Hamiltonian simulation technique proposed in [48], the extend matrix ${\tilde {\mathbf{S}}}^{\mathcal{G}}$ can be simulated efficiently in time $\mathcal{O}\left(\frac{{\Vert}{\mathbf{S}}^{\mathcal{G}}{{\Vert}}_{\mathrm{max}}^{2}{t}^{2}}{{\epsilon}}\text{poly}\left(\mathrm{log}\enspace N\right)\right)$ with precision . Therefore, the quantum Isomap can be implemented with at least quadratic acceleration under the basic framework of QKPCA.

4. Discussion

In fact, several existing algorithms are also intended to introduce nonlinearity in QML. As presented in lemma 4, the polynomial kernel with an arbitrary degree is proposed based on the tensor product of specific quantum states. As for the widely used Gaussian kernel, its quantum version is proposed in [49] and then employed in the training radial basis function network with quantum speedup [50]. However, this approach is formulated with the generalized coherent states (the language of quantum optics) rather than the universal quantum computing paradigm. In addition, the proposed quantum subroutine (i.e. Hamiltonian simulation of arbitrary kernel matrix) could be useful independently, and we hope for substantial improvements in several other kernel-based machine learning algorithms. Exampling with two well-known linear QML models, quantum support vector machine [6] and quantum linear regression [8], the general nonlinearity can be introduced by replacing the controlled unitary operation in the QPE with the e^{−iK
t}.

However, it should be mentioned that the preparation of the input quantum state is considered as a key bottleneck in the QKPCA algorithm, which is similar to many other QML algorithms. Although it can be achieved by QRAM theoretically, there is no general implementation in quantum hardware to date. Fortunately, one of recent work has proven possible the practical implementation of the QRAM oracle [51]. In which, Connor Hann et al presented a hardware efficient implementation of QRAM using the multi-mode circuit quantum acoustodynamic (cQAD) system. The physical resource requirement can be reduced drastically comparing with original cavity-implementation, which is due to the compactness of multi-mode cQAD systems. In addition, there are only logarithmic routing components are needed to be 'active' while the others can be considered as 'non-active' and 'error-free' in the QRAM as demonstrated in [52]. Therefore, given a certain error model, algorithms that require to query the memory a polynomial number of times (like the quantum linear system algorithm) might not require fault-tolerant components.

5. Conclusion and future works

In this paper, a novel approach to implement the efficient Hamiltonian simulation of the arbitrary kernel matrix is proposed. According to the truncated Taylor expansion, the arbitrary kernel matrix can be approximated as the two parts: the all-ones matrix and polynomial kernel matrices with coefficients. As a consequence of the efficient Hamiltonian simulation for each part has been derived, the whole kernel matrix is simulated by the divide and conquer approach. Based on it, the QKPCA algorithm is developed, which can be employed to perform nonlinear DR on a universal quantum computer with exponential acceleration. More importantly, the QKPCA can be regarded as a common framework to explore the quantization of other nonlinear DR techniques, like QKFD and some quantum manifold learning algorithms. Besides, with the efficient Hamiltonian simulation on the arbitrary kernel matrix, it is promising to introduce general nonlinearity to the existing linear QML models.

Clearly, although the proposal can be executed on a universal quantum computer efficiently, the fully coherent evolution of the existing quantum devices still remains a fundamental challenge in the short term. One possible alternative is the variational quantum computation which can be implemented on the noisy intermediate-scale quantum devices. Hence, one of the future works is focused on the study of constructing the various quantum nonlinear DR algorithms with variational quantum computation. Besides, applying the proposed algorithms to the real data will be interesting, which may be the other direction in future works.

Acknowledgments

This work is supported by the National Key R&D Plan under Grant Nos. 2018YFC1200200 and 2018YFC1200205; Shanghai Science and Technology Project in 2020 under Grant No. 20040501500; National Natural Science Foundation of China under Grant Nos. 61763014 and No. 62062035; the Fund for Distinguished Young Scholars of Jiangxi Province under Grant No. 2018ACB21013; Natural Science Foundation of Jiangxi Province of China under Grant No. 20192BAB207014; Science and technology research project of Jiangxi Provincial Education Department under Grant No. GJJ190297.

Appendix A.: The Taylor series expansion of the three kernels

As mentioned previously, the Taylor series expansion with the basic element ${\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}$ provides a potential solution to estimate various nonlinear kernels. The expanded forms toward the final three kernels in table 1 are described as follow:

(1) Gaussian kernel

$\begin{align}\hfill \kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)& =\mathrm{exp}\left(-\lambda \;{\ast}\;{\Vert}{\boldsymbol{x}}_{i}-{\boldsymbol{x}}_{j}{{\Vert}}^{2}\right)=\mathrm{exp}\left(-\lambda \;{\ast}\;\left({\Vert}{\boldsymbol{x}}_{i}{{\Vert}}^{2}+{\Vert}{\boldsymbol{x}}_{j}{{\Vert}}^{2}-2\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)\right)\right)=\mathrm{exp}\left(-2\lambda +2\lambda \;{\ast}\;\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)\right)\hfill \\ \hfill & =\frac{1}{\mathrm{exp}\left(2\lambda \right)}+\frac{\left(2\lambda \right)}{\mathrm{exp}\left(2\lambda \right)}\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)+\frac{\left(2{\lambda }^{2}\right)}{\mathrm{exp}\left(2\lambda \right)}{\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}^{2}+\frac{\left(4{\lambda }^{3}\right)}{3\enspace \mathrm{exp}\left(2\lambda \right)}{\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}^{3}+\cdots \hfill \end{align} \tag{ A.1 }$

(2) Exponential kernel

$\begin{align}\hfill \kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)& =\mathrm{exp}\left(-\lambda \;{\ast}\;{\Vert}{\boldsymbol{x}}_{i}-{\boldsymbol{x}}_{j}{\Vert}\right)=\mathrm{exp}\left(-\lambda \;{\ast}\;\sqrt{2-2\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}\right)=\mathrm{exp}\left(-\sqrt{2}\lambda \;{\ast}\;\sqrt{1-\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}\right)\hfill \\ \hfill & =\frac{1}{\mathrm{exp}\left(\sqrt{2}\lambda \right)}+\frac{\sqrt{2}\lambda \left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}{2\enspace \mathrm{exp}\left(\sqrt{2}\lambda \right)}+\frac{\left(\sqrt{2}\lambda +2{\lambda }^{2}\right){\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}^{2}}{8\enspace \mathrm{exp}\left(\sqrt{2}\lambda \right)}+\frac{\left(3\sqrt{2}\lambda +6{\lambda }^{2}+2\sqrt{2}{\lambda }^{3}\right){\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}^{3}}{48\enspace \mathrm{exp}\left(\sqrt{2}\lambda \right)}+\cdots \hfill \end{align} \tag{ A.2 }$

(3) Sigmoid kernel

$\begin{align}\hfill \kappa \left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)& =\mathrm{tanh}\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}+c\right)\hfill \\ \hfill & =\mathrm{tanh}\enspace c+\left(-{\mathrm{tanh}}^{2}\enspace c+1\right)\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)+\left({\mathrm{tanh}}^{3}\enspace c-\mathrm{tanh}\enspace c\right){\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}^{2}\hfill \\ \hfill & \quad -\enspace \frac{\left(3\enspace {\mathrm{tanh}}^{4}\enspace c-4\enspace {\mathrm{tanh}}^{2}\enspace c+1\right){\left({\boldsymbol{x}}_{i}^{\mathrm{T}}{\boldsymbol{x}}_{j}\right)}^{3}}{3}+\cdots \hfill \end{align} \tag{ A.3 }$

Furthermore, to make an intuitive understanding on the approximation method, the corresponding quantitative analyses are also presented. We denote the $\tilde {\kappa }\left({\boldsymbol{x}}_{i},{\boldsymbol{x}}_{j}\right)$ as the approximation result and κ( x _i, x _j) is given as the corresponding true value. As shown in the figure A1, the error of this approximation kernel function decreases with factorial growth if the approximation order is linearly increased.

**Figure A1.** (a)–(c) represent the approximation result of Gaussian kernel, exponential kernel and sigmoid kernel respectively. In each subfigure, the red line is refer to the approximation up to the order 5, the magenta line represents the approximation up to the order 7, the cyan line denotes the approximation up to the order 9 and the true value is shown with the blue line.
Download figure:
Standard image High-resolution image

Quantum algorithm for the nonlinear dimensionality reduction with arbitrary kernel

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Preliminaries

2.1. Quantum computation background

2.2. Kernel principal component analysis

3. Quantum algorithm for nonlinear DR

3.1. Quantum kernel principal component analysis

3.2. Projection on the new data

3.3. QKPCA with centering

3.4. Extension to other nonlinear DR methods

3.4.1. Quantum kernel discriminant analysis

3.4.2. Quantum manifold learning

4. Discussion

5. Conclusion and future works

Acknowledgments

Appendix A.: The Taylor series expansion of the three kernels

Footnotes

Quantum algorithm for the nonlinear dimensionality reduction with arbitrary kernel

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Preliminaries

2.1. Quantum computation background

2.2. Kernel principal component analysis

3. Quantum algorithm for nonlinear DR

3.1. Quantum kernel principal component analysis

3.2. Projection on the new data

3.3. QKPCA with centering

3.4. Extension to other nonlinear DR methods

3.4.1. Quantum kernel discriminant analysis

3.4.2. Quantum manifold learning

4. Discussion

5. Conclusion and future works

Acknowledgments

Appendix A.: The Taylor series expansion of the three kernels

Footnotes