A hybrid quantum regression model for the prediction of molecular atomization energies

Quantum machine learning is a relatively new research field that aims to combine the dramatic performance advantage offered by quantum computing and the ability of machine learning algorithms to learn complex distributions of high-dimensional data. The primary focus of this domain is the implementation of classical machine learning algorithms in the quantum mechanical domain and study of the speedup due to quantum parallelism, which could enable the development of novel techniques for solving problems such as quantum phase recognition and quantum error correction optimization. In this paper, we propose a hybrid quantum machine learning pipeline for predicting the atomization energies of various molecules using the nuclear charges and atomic positions of the constituent atoms. Firstly, we will be using a deep convolutional auto-encoder model for the feature extraction of data constructed from the eigenvalues and eigenvector centralities of the pairwise distance matrix calculated from atomic positions and the unrolled upper triangle of each Coulomb matrix calculated from nuclear charges, and we will then be using a quantum regression algorithm such as quantum linear regression, quantum radial basis function neural network and, a quantum neural network for estimating the atomization energy. The hybrid quantum neural network models do not seem to provide any speedup over their classical counterparts. Before implementing a quantum algorithm, we will also be using state-of-the-art classical machine learning and deep learning models such as XGBoost, multilayer perceptron, deep convolutional neural network, and a long short-term memory network to study the correlation between the extracted features and corresponding atomization energies of molecules.


Introduction
In recent years, machine learning and data-driven analysis methods have had a significant impact in various domains of science and technology, including medicine [1,2], pharmaceutical [3], and chemical industries [4]. Efficient and accurate estimation of various fundamental chemical properties such as atomization energy of molecules is very crucial for the design and discovery of new compounds and drugs. Such tasks often require very computationally intensive simulations and calculations to derive empirical relations between various chemical parameters. However, machine learning algorithms and deep learning algorithms [5][6][7] have been proven to be very effective in learning these hidden patterns behind the complex distributions of high-dimensional data. This coupled with the speedup offered by the quantum computers, thanks to fundamental quantum mechanical concepts like superposition and entanglement [8] could revolutionize the field of computational drug design.
One of the main challenges of applying machine learning for the estimation of chemical properties of a molecule [9][10][11] is to efficiently encode the molecule as an input to the model. In this paper, we will be using the QM7 dataset [12,13] that contains the Coulomb matrices, atomization energies, atomic charges and the Cartesian coordinates of atoms for a wide range of molecules. Similar to the previous works that have implemented machine learning methodologies using this dataset [14], we will try to map the atomization energies of the molecules using only the Coulomb matrices and the spatial locations of the atoms. In recent years, there has been an increasing interest in the study and development of novel quantum machine learning algorithms [15][16][17][18][19][20] and quantization of classical machine learning and deep learning models such as a quantum support vector machine [21][22][23], quantum neural networks [24][25][26], quantum convolutional neural networks (CNNs) [27], and quantum generative adversarial networks [28]. In this paper, our primary objective is to design and implement a hybrid quantum regression model where we use a classical deep learning model for feature extraction and a quantum regression algorithm for the prediction of atomization energies and study the feasibility of running these models on near term noisy intermediate-scale quantum computing (NISQ) systems.
We will be preparing our training dataset by unrolling only the upper triangular matrix of the Coulomb matrix since other elements of the matrix are redundant. We will also calculate a pairwise distance matrix using the Cartesian coordinates of the atoms and then calculate the eigenvalues and eigenvector centralities of the distance matrix. In section 2, we will use the concatenated arrays of unrolled Coulomb matrix, eigenvalues and eigenvector centralities for feature extraction to derive a compressed representation of the molecules. In section 3, we will be analyzing the correlation between the extracted features and the atomization energies of the molecules. Then in section 4, we will be looking at various quantum regression models before presenting the conclusions in section 5.

Methodology
The QM7 dataset is a part of the open-source database called quantum-machine that aims to facilitate the development of an efficient system for fast and accurate simulations of quantum-chemical systems from first principles. The QM7 dataset, in particular, is a dataset aimed toward the prediction of atomization energies from molecular geometry and is a subset of GDB-13, a database of 1 billion stable and synthetically accessible organic molecules. The dataset consists of 7165 molecules that are composed of 23 different atoms including heavy atoms like C, N, O, and S. The first component of the dataset is the Coulomb matrix which is a symmetric matrix of size 23 × 23 calculated from the nuclear charges of the atoms that constitute a particular molecule.
The Coulomb matrix is calculated as follows: where Z i is the nuclear charge of an atom i that is located at a position R i . To avoid redundancy, we will be taking the upper triangular matrix of the Coulomb matrix, including the diagonal and unrolling it to map a matrix of size 23 × 23 to an array of size 1 × 276. The Coulomb matrix is invariant to translation and rotation of the molecules. The dataset also consists of Cartesian coordinates of the atoms from which we have calculated the pairwise interatomic distance matrix. We have then unrolled the distance matrix by taking the upper triangular matrix of the matrix, excluding the diagonal matrix mapping a matrix of size 23 × 23 to an array of size 1 × 253. The pairwise interatomic distance matrix of a molecule is a two-dimensional square matrix containing the distances taken pairwise, between the atoms of a specific molecule .We have plotted the distribution of average interatomic distance of the molecules to observe the distribution, as shown in figure 1. We then calculate the eigenvalues and eigenvector centralities of the distance matrix to account for the intensity and direction of the spread of the data, which are essential in understanding the influence of an individual atom on the entire molecule. Considering the pairwise distance matrix to be an adjacency matrix with |V| vertices, we can calculate the eigenvalues and eigenvector centralities. The relative centrality score of vertex v (c v ) can be defined as where M(v) is the set of neighbours of v, λ is the greatest eigenvalue of D [29], and c t is the relative centrality score of vertex t ∈ G. Eigenvector centrality computes the centrality for a node based on the centrality of its neighbors, and it is a measure of a node's importance. The calculation of Eigenvector centrality is iterative and we use a normalized vector with equal values as a starting vector for the centrality scores. The size of eigenvalues and eigenvector centralities are 1 × 23 and 1 × 23, respectively. We then concatenate the flattened Coulomb matrix, pairwise distance matrix, eigenvalues, and eigenvector centralities, which gives a final array of size 1 × 575 for an individual molecule and a dataset of size 7165 × 575.  We have also plotted the atomization energies of the molecules to observe the distribution, as seen in figure 2.
We will now be using a deep convolutional auto-encoder for feature extraction to derive a compressed representation of our dataset since amplitude encoding of our preprocessed QM7 dataset requires a large number of qubits even though amplitude encoding is the most qubit efficient form of encoding. Autoencoder models consist of an encoder network and a decoder network. The encoder network learns to map the input samples to a latent vector and the decoder network learns to reconstruct the input from the latent dimension. Deep convolutional auto-encoders [30] have been used for feature extraction in various video and signal processing and analysis problems and have proven to be very effective in reconstructing the complex distributions of data [31,32]. The architecture of our Deep convolutional auto-encoder is shown in figure 3.
We have used 64, 32, 16, and 8 filters (kernels) of size (18,1) in the convolution layers and a pool size of (2,1) for the average pooling layers. Similarly, we have used 16, 32, and 64 filters (kernels) of size (18,1) in the transposed convolutional layers and a scale factor of (2,1) in the upsampling layers. We have used a linear activation function throughout the network and trained the model using a stochastic gradient descent optimizer [33][34][35] with a mean squared error loss function at a learning rate of 0.001. We have trained the model for 1000 epochs with a batch size of 100. The model has been implemented using the Keras library  We have taken the features from the bottleneck layer of the model, which gives us a feature vector of dimension 1 × 64 for each molecule. Hence we have mapped out a dataset of size 7165 × 575 to a compressed representation of size 7165 × 64. We have used metrics like correlation matrix, 2-component kernel PCA (principal component analysis) which is an extension of PCA that uses a kernel function (in our case, a radial basis function) to map the dataset into a lower-dimensional space, and a t-distributed stochastic neighbor  embedding to compare the actual data to the reconstructed data, and evaluate the performance of the model. Similar to kernel PCA, t-distributed stochastic neighbor embedding (t-SNE) is a non-linear dimensionality reduction technique. t-SNE minimizes the KL-divergence [36] between the distributions of the high-dimensional data and the mapped low-dimensional data.
The reconstructed results closely resemble the actual data, and the slight discrepancies in the Correlation matrix and K-PCA results can be attributed to the noise-induced in the reconstructed samples due to the reconstruction loss of the network. The results are shown in figures 5, 6, and 7.

Classical regression methods
Before applying quantum regression methods for predicting the atomization energy of the molecules using the extracted features, we will be looking at some classical regression methods such as XGBoost [37], multilayer perceptron (MLP) [38], deep convolutional neural network (CNN) [39] and a bidirectional long short-term memory recurrent neural network (LSTM)-RNN [41] (An implementation of a recurrent neural network using LSTM modules) model to study and confirm the correlation between the extracted features and corresponding atomization energies of molecules.
XGBoost is an open-source framework for gradient-boosted decision trees. An MLP is one of the most commonly used models that belong to the class of feedforward ANNs. Each neuron in one Layer of the MLP is connected to the neurons of the next layer via a non-linear activation function. An optimization algorithm called back-propagation makes MLPs very effective in learning and generalizing the distributions of data. Recurrent neural networks are widely used in various machine learning problems such as speech recognition [42], financial data analysis [43], time series forecasting [44], and time series classification [45]. The main feature that makes RNN highly versatile when working with them is their ability to learn sequences  in the data. CNN is primarily used for image and video data, but we can use a deep CNN model for our regression problem due to its ability to learn non-linear hierarchical structures and lower computational costs. The architectures of the models are shown in figure 8.
For the XGBoost model, we have used mean absolute error (MAE) for the evaluation metric, and a learning rate of 0.01 and the maximum depth of a tree set to 6. For the MLP, CNN, and LSTM models, we have used a rectified linear unit activation function in the hidden layers and a linear activation function in the output layer. We have used the ADAM optimizer [46] with a learning rate of 1×10 -4 and decay of 1×10 -5 . All three models have been trained for 10 000 epochs with a batch size of 128. The models have been implemented using the Keras library with Tensorflow backend and trained on a Google Colaboratory notebook with GPU (Nvidia Tesla K80) acceleration. We have used hold-out cross-validation with an 80-20 split to evaluate the performance of our models, and the results are presented in table 2.
The CNN model performs the best and the results of the MLP and LSTM model are comparable. The results of the LSTM model shows us that there is not any specific time dependence or sequential features in the input data. The results of the XGBoost model are not as good as what we see from the deep-learning-based models which is an expected behavior when studying high dimensional data. The MAE and the root mean squared error (RMSE) values are fairly low but are still far from current benchmark values of around 3 kcal mol −1 , this is due to the fact that we are using the extracted features for regression which is a

Quantum linear regression
When studying large volume datasets, current machine learning models often encounter a performance bottleneck when trained on classical systems [47], but thanks to the parallel computing power of quantum systems, we can solve this problem by developing quantum analogous models of classical machine learning algorithms. In this section, we will be implementing and comparing the results of a common curve fitting model called linear regression and quantum linear regression [48]. The primary advantage of the quantum linear regression method is the logarithmic computation time when computed on a quantum processor using quantum random access memory (qRAM) as opposed to the linear computation time of classical Vectorized Linear regression. Ewin Tang's work [49,50], which involves l 2 -norm sampling, demonstrated that for manipulating non-sparse low-rank matrices, quantum algorithms may be dequantized to reduce the exponential speedup to polynomial advantage. But, since we are dealing with a high-rank matrix, the quantum linear regression algorithm provides a significant performance advantage over its classical counterpart [51,52]. We will be using the extracted features for testing quantum regression models. We will be implementing the quantum linear regression model by encoding our features vectors as quantum states using sparse matrix representations. The feature vectors are concatenated into a matrix X, with each row of the matrix as a feature vector and the atomization energies will be stored in a vector Y ∈ R N * 1 . We will be using hold out cross-validation to split the data into 80% training and 20% testing/validation. Our implementation of quantum linear regression will be based on classical Vectorization-based' linear regression where the weight matrix W is calculated by calculating the gradient of least squares cost. We will also be adding Gaussian noise (∆) of mean 0 and a standard deviation of 0.1 to replicate the behavior of NISQ systems. The linear algebra operations have been implemented using a python framework for numerical simulations of quantum systems called QuTip [53]. The results are presented in table 2. The results of the linear regression model and the quantum regression model are both very comparable and we can see that the quantum regression model performs slightly better in spite of the addition of Gaussian noise which could be due to the sparse encoding of the input vectors: whereŶ are the predicted atomization energies andX is the testing/validation dataset.

Quantum radial basis function neural network (Q-RBFNN)
We will now be implementing a Q-RBFNN by amplitude encoding of the feature vectors. In reference [54], the authors proposed a different version of a quantum RBFN, intending to solve classification problems and implemented by defining the state of the weight as a tensor product of single-qubit states. Instead, in our article, we propose to solve a regression problem and implement it by defining a new quantum equivalent activation function based on the Hilbert-Schmidt norm. We will be encoding the feature vector as amplitudes of standard basis based on the qRAM mapping [55,56]. This encoding will exponentially reduce the computation time for finding the centroids of the hidden layer of radial basis function neural network (RBFNN). In classical RBFNN, we find the centroids in the clustering step of the algorithm by calculating the Euclidian distance of the vectors by assuming them as Cartesian coordinates in a high dimensional space, but in Q-RBFNN, we will be using the trace distance of the density matrices of our amplitude encoded states. We will also be adding multivariate Gaussian noise (∆) of mean 0 and covariance 0.1 to replicate the behavior of NISQ systems. We will be using the Hilbert-Schmidt norm of the difference between the centroids and input states as the radial basis function of our Q-RBFNN model. Hilbert Schmidt norm is a matrix norm operator, it is the square root of the trace of the inner product of two states. Since we are using a specific radial basis function, the Q-RBFNN algorithm can be interpreted as a specific instance of the RBFNN algorithm. The results have been presented in table 3. We can see a percentage difference of 26.4963 in the MAE score and a percentage difference of 20.0184 in the RMSE score. While there is a significant difference in the performance of the classical and quantum RBFNN models, the R2 score and the Pearson correlation value of the Q-RBFNN model signifies that the model is capable of learning the distribution of the input data.
where ϕ is the radial basis function of choice.
4. Construct the weight matrix and then calculate the predicted outputs.
whereŶ are the predicted atomization energies andX is the testing/validation dataset.

Hybrid quantum neural network and QFT-based hybrid QNN
We will now be implementing a hybrid quantum neural network using the variational method. The feedforward network will be built using classical convolutional and dense layers and variational circuits with parametrized rotation gates, and the optimization and updating of the parameters corresponding to the network gates will be done using classical methods. The input feature vectors will be encoded into quantum states using amplitude encoding and, each quantum state corresponding to the feature vector will be used as the input to the classical network and the output vector of the classical layer before the quantum circuit will be used as the input state to the quantum circuit. The quantum circuit is comprised of a series of parametrized rotation gates along x, y and z-axis and the measurements of the circuit are mapped to a single output value using fully connected layers, and we calculate the cost between this output value and the label to update the parameters of the circuit based on a hybrid quantum-classical optimization method using the parameter-shift rule and gate decomposition as described in the paper by Crooks [57]. The atomization energies of the samples have been rescaled to the range [0, 1], and we will be primarily focusing on studying the ability of our hybrid QNN model to learn the distribution of the data, and for this, we have chosen training convergence as the parameter of interest. We have used IBM's Qiskit library with the QASM simulator backend for building the quantum circuit and PyTorch library for implementing the classical neural network part and for the optimization of the entire model. A general representation of the models are shown in figures 9 and 10.
1. Encode the input vectors as quantum states by amplitude encoding them as amplitudes of standard basis.
(a) For an input vector of dimension m, the encoded quantum state is (can be done using m qubits): 2. Perform clustering based on the trace distance of density matrices and let the centroids be |µ K ⟩ for K clusters.
(a) The trace distance of density matrices if denoted as: 3. Calculate the Hidden layer matrix using the Hilbert-Schmidt norm of the difference between the centroids and input states: 4. Construct the weight matrix and then calculate the predicted outputs.
whereŶ are the predicted atomization energies andX is the testing/validation dataset.
Additionally, we will also be studying a quantum Fourier transform (QFT)-based hybrid QNN model. Fourier transform is a way of representing temporal data in the frequency space. Fourier transform is one of the most frequently used operations in signal processing and also machine learning. Analyzing the data in frequency space could provide new information about the distribution of the data and can also help ML models learn the representations of the data more efficiently. The QFT is analogous to classical discrete Fourier transform, but it is applied over the amplitudes of a given wavefunction, converting states in computational basis to Fourier basis. QFT circuit can be implemented using a series of controlled phase rotations and SWAP gates. The QFT circuit transforms the bases with the help of a series of H-gates that transform the Z-basis states |0⟩ and |1⟩ to the X-basis states |+⟩ and |−⟩. They similarly map all multi-qubit states in the computational basis to their corresponding Fourier basis states. A six qubit QFT circuit is shown in figure 11. We will be using the amplitude encoded feature vectors as the input to a QFT circuit (a 64-dimensional input state is passed to a six-Qubit QFT circuit), and the output states of the QFT circuit will be used as the inputs to a hybrid QNN. For a classical k-dimensional vector, the quantum state encoding requires log 2 (k) qubits, and in general, the time complexity of quantum state preparation is O(log 2 (N)) [56],   where N is the dimension of the vector k. The complexity of arbitrary quantum state preparation is O(N). We hypothesize that the Fourier basis states could help our QNN model achieve better results compared to a standard hybrid QNN model.
As we can see from figure 12, the L1-Loss on the training samples for both the models has plateaued after a few epochs and converged to a relatively low value, which signifies that our QNN models are able to learn the internal distribution of the data, but, more interestingly, the loss of the QFT-based QNN model converged to a lower value compared to the standard hybrid QNN model, which compares well with our hypothesis. Although the QFT-based model demonstrates a better training convergence, the QFT block may be substituted by Classical Fast Fourier Transform since it provides no quantum speedup due to the performance bottleneck introduced by the preparation and readout of the states. We have included the QFT algorithm under the perspective of changing the classical node (convolutional and fully connected) by a quantum counterpart that does not need to read out classically the solution first.

Conclusion
This paper proposes a detailed methodology for the application of various quantum machine learning models for the prediction of molecular atomization energies. Firstly, we are able to efficiently encode each molecule into a vector using the information from the Coulomb matrix, atomic charges, and the Cartesian coordinates of the atoms in a respective molecule. Then we have successfully derived compressed representations of the molecules by training a deep convolutional autoencoder on the encoded vectors of the molecules. The reconstruction outputs of the autoencoder are in close agreement to the input encoded vectors. We have then used the feature vectors from the fully-connected bottleneck layer of the autoencoder as the compressed representations of the molecules and formed a dataset for the application quantum regression models. Before applying the quantum regression models, we have tested some classical deep learning and machine learning models to check the correlation between the feature vectors and the atomization energy of a particular molecule. We have calculated metrics such as MAE, RMSE, coefficient of determination (R2), and the Pearson correlation, and observed very good results indicating a close correlation between the feature vectors and the atomization energies of the molecules.
We have applied quantum machine learning models on the feature vectors for solving the regression problem of estimating the atomization energy of a molecule. We have studied four different quantum machine learning models, namely quantum linear regression, radial basis function neural network, a hybrid quantum neural network, and a QFT-based hybrid quantum neural network. The results of the quantum linear regression model are similar to the results obtained from a classical linear regression model. Also, we have proposed a new methodology for the implementation of a quantized version of the RBFNN model based on the trace distance of the density matrices of amplitude encoded states and the concept of Hilbert-Schmidt norm. We have successfully trained the Q-RBFNN model and presented the results which closely resembled those obtained from the classical RBFNN model. Lastly, we have implemented and studied the ability of a hybrid quantum neural network and a QFT-based hybrid QNN model to learn the internal representation of the data. We have studied the training convergence of both the models and presented two significant observations. Firstly, we are able to verify that the L1-loss metric converged to a low value for both the models, signifying that the models are able to map the feature vectors to the labels and secondly, we have seen that the loss corresponding to the training convergence of QFT-based hybrid QNN model converged to a lower value than that of the standard hybrid QNN model indicating that the Fourier basis states offer a better representation of the data compared to the standard basis states. The hybrid quantum neural network models do not seem to provide any speedup over their classical counterparts. While the results of quantum machine learning models discussed in this paper might not be comparable to the results of the current state-of-the-art classical machine learning models, they still offer a reasonable estimate of the capabilities of quantum and hybrid quantum algorithms for regression tasks, and the current NISQ systems might be suitable platforms for designing and developing quantum regression models. We hope to improve on the methods discussed in this paper and explore more quantum machine learning and QFT-based quantum machine learning methodologies. We hope that this paper becomes a precursor to more research in the field of quantum machine learning and motivate others to work on novel quantum machine learning methodologies for regression tasks in computational drug design.