Spherical convolutions on molecular graphs for protein model quality assessment

Processing information on 3D objects requires methods stable to rigid-body transformations, in particular rotations, of the input data. In image processing tasks, convolutional neural networks achieve this property using rotation-equivariant operations. However, contrary to images, graphs generally have irregular topology. This makes it challenging to define a rotation-equivariant convolution operation on these structures. In this work, we propose Spherical Graph Convolutional Network (S-GCN) that processes 3D models of proteins represented as molecular graphs. In a protein molecule, individual amino acids have common topological elements. This allows us to unambiguously associate each amino acid with a local coordinate system and construct rotation-equivariant spherical filters that operate on angular information between graph nodes. Within the framework of the protein model quality assessment problem, we demonstrate that the proposed spherical convolution method significantly improves the quality of model assessment compared to the standard message-passing approach. It is also comparable to state-of-the-art methods, as we demonstrate on Critical Assessment of Structure Prediction (CASP) benchmarks. The proposed technique operates only on geometric features of protein 3D models. This makes it universal and applicable to any other geometric-learning task where the graph structure allows constructing local coordinate systems.


Introduction
Prediction of protein three-dimensional (3D) structure is an important problem in structural biology and structural bioinformatics.Despite tremendous progress in this field [1,2,3,4], the accuracy of the predicted structures tends to vary significantly depending on the availability of additional information, and the number of homologous structures and sequences in the databases [5,6,7,8].Therefore, estimation of reliability of the predicted models, and also the assessment of the local structural fragments, is crucial for the practical application of these predictions.
The problem of protein model quality assessment (MQA) has been recognized by the protein structure modeling community and became one of the subchallenges of CASP, the Critical Assessment of protein Structure Prediction community-wide challenge [9,10].All of the state-of-the-art MQA methods use, to a certain extent, supervised or unsupervised machine learning.Initially, statistical potentials [11], shallow neural networks [12], regression methods, and support vector machines [13,14] were widely used.More recently, this problem has also got attention from the machinelearning community.This triggered the development of more advanced approaches, such as deep learning-based techniques [15,16,17] and graph convolutional networks (GCN) [18,19,20].The latter methods operate on a molecular graph representation of protein models.
In this work, we propose to capture the 3D structure of a molecular graph using convolution operation based on spherical harmonics.The main idea of our approach is to learn spatial filters in a reference orientation of each graph node.Indeed, proteins are chained molecules, with a repeated topology of the backbone.Thus, using local coordinate frames constructed on the protein's backbone, we can build rotational-equivariant spherical filters.We then incorporate these filters into a message-passing framework and design a new method called Spherical Graph Convolutional Network (S-GCN), which significantly outperforms the classical GCN architecture.
Most of the protein MQA methods operate on the atom-level representation of a protein molecule [14,11,21,16,20].At the same time, the state-of-the-art methods use various types of features that often include information about the evolution of molecules or other biological characteristics.On the contrary, S-GCN works with the residue-level protein representation, which significantly reduces computational costs and the number of parameters.Also, our method processes only geometric information, i.e. the input feature vector of each amino acid contains only three geometric features and a one-hot vector representing the type of the amino acid.This work demonstrates that the state-of-the-art quality of the protein model assessment task can be achieved using only geometric properties of protein models without any chemo-physical prior information, which often requires additional expensive computations.
The main results of our work can be summarized as follows: • We propose a new message-passing method based on trainable rotational-equivariant spherical filters.• The proposed method significantly improves the quality of model assessment as compared to a classical GCN approach, applied to the same input configuration.• Despite the residue-level representation and only geometric input features, the results of the proposed method are comparable to the state of the art.
Graph neural networks (GNNs) for molecular graphs.In the last years, various GNNs were proposed to address the problem of learning on molecular graphs.Starting with the message-passing paradigm in the molecular graph domain [36], further multiple approaches elaborated on this idea [37,38,39,40,41,42].All of them were designed to operate on small molecules.For example, in QM9 [43,44], a popular benchmark that is used to evaluate these methods, molecules consist of up to 23 atoms.On the contrary, a protein molecule can contain thousands of atoms, and this fact requires different approaches that take into account the size of the data.Recently, GNNs have also been started to be applied to protein graphs for solving various problems such as protein design [45], protein docking [46,47], classification [48,49], and quality assessment [19,18,20].
Equivariance.Processing information in 3D must be stable against rigid-body transformations of the input data.This stability can be achieved using equivariant operations, which is a very active research topic, especially regarding rotational equivariance.For example, rotation-equivariant CNNs were proposed for spherical images using correlations on a sphere [50] and then extended to fully Fourier-space architectures [51,52].Similar architectures can be constructed for rigid-body motions using tensor field rotation-and translation-equivariant networks [38,48].Spherical harmonics kernels have also been applied to point-cloud data [53].Alternatively, for some types of volumetric data, rotation-equivariant representation can be constructed with oriented local coordinate frames [16].The same idea can be applied to the protein graph representation, where the spatial relation between local frames can be encoded using spatial edge features [45] or additional edge descriptors [19].For general molecular graphs, the problem is more difficult.Still, there has been significant progress using, e.g. the message-passing formalism with messages containing radial and directional information about neighboring graph nodes [42].
3 Proposed Method

Protein graph
A protein molecule is a chain of amino acids, or residues, folded in a 3D space.We construct a graph G of the protein molecule by splitting the surrounding space into cells using the Voronoi tessellation method Voronota [29].Nodes of the resulting graph correspond to the protein residues and edges are associated with the pairs of residues whose Voronoi cells have a non-zero contact surface.Figure 1 schematically shows the graph construction.
Each node v of the graph G contains a feature vector x v associated with the corresponding protein residue.These features include one of 20 amino-acid types encoded with the one-hot representation, the solvent-accessible surface area for each residue, the volume of residue's Voronoi cell, and the "buriedness" of the residue, which is a topological distance in the graph G to the nearest solventaccessible node.We represent the whole set of nodes as a feature matrix X ∈ R N ×d where N is a number of residues and d = 23 is the size of the feature vector.
To describe the edges of the graph G, we will use the following notations.Let A be the symmetric binary adjacency matrix of the graph.For any pair of residues v and u, the two corresponding entries of the matrix A equal 1 if u and v have an edge, and zero otherwise.In our settings, the graph G does not have self-loops, hence the main diagonal elements of the matrix A are zeros.In order to refer to neighbors of a node v in the graph G, we will use notation N (v).
r s u 6 t J + t 5 0 Z q y P m f 2 4 Q e s l w / 5 X p E O < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " / n v 1

Spherical harmonics
Let us consider a complex square-integrable function f (θ, ϕ) defined on a unit sphere S 1 .This function can be expanded in a polynomial basis using spherical harmonics as the basis functions, where w m l are the expansion coefficients, and Y m l (θ, ϕ) are the spherical harmonics [54], Here, P m l (cos θ) are the associated Legendre polynomials [54].We should also note that a real function on a unit sphere can be decomposed in a polynomial basis more compactly using real spherical harmonics as the basis functions.Below we will be using this real basis, which is specified in Supplementary Materials.

Local coordinate system
The protein backbone consists of atom repetitions C, C α , N , O.This allows us to unambiguously associate each residue with a local coordinate system.Indeed, for each residue we can define the normalized C α -N vector as the x-axis, the unit vector lying in the C-C α -N plane, orthogonal to x, and having positive dot product with C α -C as the y-axis, and the vector product of x with y as the z-axis.Then, given a node v, we can associate each neighbor u ∈ N (v) with a pair of spherical angles They specify the angular position of the projection of the node u onto a unit sphere in the local coordinate system of v.An example of a local coordinates system is schematically shown in Figure 1A.Now, having an unambiguous orientation for each node in the graph, we can construct a rotation-equivariant convolution operation.

Spherical convolution
We can approximate the expansion (3) of the function f (θ, ϕ) by cutting the series at the maximum expansion order L, The same approximation can be obtained for a matrix function where matrices W m l denote expansion coefficients of the function F in the Y m l basis.Finally, we can introduce the spherical convolution operation for the vertex v in the following way, Considering matrices W m l to be optimized parameters, we will thus learn a spherical filter.We should specifically emphasize that matrices W m l are rotation-equivariant by construction.

Neural network
The distinctive feature of convolutional networks built on spatial graphs is the way the graph nodes exchange information by passing messages to each other.On each layer of the network, nodes' feature vectors are combined and updated using the information from the neighboring nodes [55,56].
In our implementation, for the information exchange, we use the proposed spherical convolution operation (5).
Let A Ω be a matrix of local angular coordinates for each node.This means, for any pair of graph nodes v and u, the corresponding entry of matrix A Ω is a pair of angular coordinates of u with respect to the local coordinate system of v.We also denote Y m l (A Ω ) as a result of the elementwise application of the spherical harmonics Y m l to the matrix A Ω .Then, the kth layer of the spherical graph convolutional network can be expressed as follows, where σ is a nonlinear activation function, W m l and W are trainable parameters, b is a trainable bias vector, H k−1 and H k are nodes' feature matrices before and after applying the layer, and H 0 ≡ X is the input feature matrix.If we let the maximum expansion order L = 0, we can see that the operation (6) transforms to a standard message-passing form, In Supplementary Materials we also provide a modification of the proposed spherical convolution layer ( 6), which explicitly uses information about contact surface areas between Voronoi cells, and discuss the corresponding network architecture.

Experiment
The main purpose of the model quality assessment task is to evaluate the deviation between a generated model of a protein molecule and its native, or target, structure.If the target structure is known, the quality of the model can be calculated by computing one of specifically designed metrics, e.g., CAD-score [57], lDDT [58], or GDT-TS [59].Most often, however, experimental protein structures are unknown, and thus there is a need for protein structure prediction and model quality assessment.In this section, we report the results of the protein model quality assessment task obtained by our spherical architectures and the GCN baseline.We trained all of our networks using local per-residue CAD-scores as the ground truth.They have been shown to be a more informative and stable metric compared to other MQA measures with respect to local structural perturbations of a protein molecule [60].To retrieve the global per-model scores, we averaged the predicted local scores.
We provide results of S-GCN along with several state-of-the-art MQA methods that are described in detail below.We would also like to emphasize that for several reasons, in this work, we do not compare S-GCN with recent GNNs designed for small molecules.First of all, we attempted to train tensor field networks [38] and DimeNet [42] on protein molecules but did not get any adequate results.Secondly, we are unable to test S-GCN on established ML benchmarks such as QM9 [43,44] due to the specificity of the graph representation in our method.

Datasets
For our experiments, we collected data from the Critical Assessment of Structure Prediction (CASP) benchmarks [61,4].They contain experimentally obtained native protein structures and the corresponding 3D models predicted by the CASP challenge participants.
For training, we used data from CASP [8][9][10][11] stage2 submissions.For each target, we additionally generated 50 near-native models [62] in order to enrich the training dataset with high-quality examples.
For validation and selection of hyperparameters, we used data from CASP12 stage2 submissions.
All models that we used for training and validation were initially filtered and preprocessed.More precisely, we excluded targets that had only models with low CAD-scores and preprocessed all models by removing residues that were not present in the target structure.More details and the list of all targets are available in Supplementary Materials.In total, we had 333 target structures and 73,418 models from CASP [8][9][10][11] for training and 39 targets and 5,411 models from CASP12 for validation.
Finally, to test our architectures, we used unrefined data from CASP13 (73 target structures, 10,882 models) and unrefined data from CASP12 (38 target structures, 5471 models).
For each model, we precomputed matrices Y m l (A Ω ) up to the 10th expansion order.These matrices were the most space-consuming part of our dataset, as a spherical harmonic expansion of order L requires the storage of L2 coefficients for each pair of adjacent nodes in a graph.

Metrics
For the evaluation of the methods, we chose z-scores, MSE, determination coefficient R 2 , Pearson, and Spearman correlations, as it is described in more detail below.We used global CAD-scores as the ground truth for the assessment 2 .We computed z-scores for the top-predicted protein models for each target and then averaged them over all targets, as explained in more detail in Supplementary Materials.For the MSE, R 2 , and correlations, we used two different ways of calculation: per-target and global.In the per-target approach, we computed the metrics separately within each protein target and then averaged results over all targets (we averaged correlations using the Fisher transformation [63]).In the global approach, we stacked scores of all protein models into one vector and calculated the metrics on this vector.For each metric, we also computed bootstrapped means and confidence intervals.For the global metrics, a bootstrapped sample is chosen from the whole set of models.For the per-target metrics, a bootstrapped sample is a sample of targets and their models, respectively.These results are available in Supplementary Materials.

Baseline architecture
For the baseline, we built a standard graph neural network based on the message-passing operation described in eq. ( 7).The structure of the proposed architecture can be split into three main parts.The encoder is a set of fully-connected layers that transform the residues' features into a high-dimensional space.The message-passing part is a set of graph convolution layers (7) that capture the structure of a protein graph and work as a feature extractor.Finally, the scorer is a set of fully-connected layers with a sigmoid at the end.They form a multilayer perceptron and use the obtained features to predict the scores of each protein residue.As a result, we obtain three main design parameters -the number of encoder, message-passing, and scoring layers.We performed a grid search on the values of these parameters and found out that the optimal architecture had 3 encoder layers, 8 message passing layers, and 3 scoring layers.For each layer, we used the ELU activation function and the dropout rate set to 0.3, as we detected it to be optimal.In total, our baseline network contains 339,053 trainable parameters.Table 1 briefly lists the final architecture.[12][13] and the results of SBROD for CASP13 from the official CASP archive at predictioncenter.org.To obtain the results of SBROD on CASP12 and Ornate and VoroCNN on CASP [12][13], we ran these methods locally.We should emphasize that the main results we report in this work were obtained on the CASP13 dataset, which was not used during training and validation.However, to give a complete picture, we also provide the results obtained on the unrefined CASP12 dataset.Table 2 lists the results for CASP12 and Table 3 lists the results for CASP13.
We can see a huge performance gap between the baseline network and the other methods.This can be explained by the fact that the baseline approach uses neither the 3D structure of the graph nor additional chemo-physical or biological features that are widely accepted by the state-of-the-art methods.At the same time, we would like to emphasize that our spherical graph convolutional networks, which explicitly use the 3D structure of the data, managed to achieve a similar or better quality of predictions compared to the state-of-the-art methods.Figure 2A shows some of the spherical filters learned by the 5th and the 10th order S-GCNs.We can see their rather complex shape, which is difficult to interpret solely from physico-chemical considerations.
Tables 2 and 3 also demonstrate that using a higher order of the spherical harmonic expansion improves the MSE and R 2 metrics.At the same time, the order 5 seems to outperform the 10th order in correlation and z-score metrics.This can be explained by the absolute values of the predictions.Indeed, Figure 2B illustrates that the predictions of the 10th order network are closer to the diagonal, thus improving MSE and R 2 .It also explains that even though some methods can have high correlation metrics, their predictions are shifted with respect to the main diagonal, which results in negative R 2 .Thus, we can conclude that a higher polynomial order of the network allows us to better predict the absolute values of protein scores.
Regarding the correlation metrics, the 5th order S-GCN performs better than the others.We can conclude that it should be the method of choice for ranking protein models and selecting the best model from a given set.Also, taking into account the fact that the 10th order S-GCN has considerably more trainable parameters, and takes 4 times more disk space than the 5th order S-GCN, it makes more sense to use the latter for practical tasks.We can also see that the last scoring layer improves the correlation metrics.S-GCNs also demonstrate a better prediction quality on z-scores, but, as we show in Supplementary Materials, these metrics are not stable and we can not confidently say that one method outperforms another because they all have intersecting confident intervals.

Conclusion
In this work, we applied spherical convolutions to capture the 3D structure of a protein graph.The results demonstrate that our method gives a significant improvement in the quality of predictions compared to the baseline without orientational relations between the graph nodes.The spherical convolution method can also be combined with other approaches for the protein model quality assessment, and can also potentially use more input features.Thus, we believe it will be possible to achieve even higher prediction results adding biological and chemical information to the input graphs.
In addition, we would like to notice that the idea of spherical convolutions is universal and can be applied to various types of graph-learning tasks, provided that the graph structure permits us to define an equivariant coordinate systems for each graph node.
r d o D t 0 b 0 T G r f F g P M 5 b F 4 z P m T 3 0 A 8 b z B + l 5 k o I = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " / n v 1 i / 0 L e N C R + D x B 5 d i s c 1 c A 8 r d o D t 0 b 0 T G r f F g P M 5 b F 4 z P m T 3 0 A 8 b z B + l 5 k o I = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " f p 1 8 B + p R y U M e 4 s z T g i P 8 A w v z p 3 z 5 L w 6 b 9 P S J W f W c w B / 5 L z / A L r I j 0 U = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " x x f N V v 8 p b P x f i G x N 7 c P p z D D N 4 m 8 = " > A A A B 5 3 i c b Z C 7 S g N B F I b P x l u M t 3 j p b A a D Y B V 2 b d T K r x P / q n 5 T 9 9 p u r e k W a Z T h D M 7 h E j y 4 h i b c Q Q t 8 Y I D w D K / w 5 j w 6 L 8 6 7 8 7 F s L T n F z C n 8 g f P 5 A 0 l s j L c = < / l a t e x i t > x u < l a t e x i t s h a 1 _ b a s e 6 4 = " O N P S 0 B H 1 / g O l w p e T < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r + z g j q n 0 o L f z F b 5 l 9 o E M L s T 9 t Z s = " > A A A B + H i c b V C 9 T s M w G P x S / k r 5 C z C y W F R I T F X C A m w V L I x F I r R S G 0 W O 4 7 R W n T i y n Y o q 6 p u w M A B i 5 V H Y e B u c N g O 0 n G T 5 d P d 9 8 v n C j D O l H e f b q q 2 t b 2 x u 1 b c b O 7 t 7 + w f 2 4 d G j E r k k 1 C O C C 9 k L s a K c p d T T T H P a y y T F S c h p N x z f l n 5 3 Q q 4 F p y 2 A 7 S c Z P l 0 9 3 3 y + a K U U a V d 9 8 s q r a y u r W + U N y t b 2 z u 7 e / b + w b 0 S m c T E x 4 I J 2 Y 6 Q I o x y 4 m u q

Figure 1 :
Figure 1: Schematic representation of a molecular graph.(A) 3D protein structure is partitioned into Voronoi cells, shown with the dashed lines.The central amino acid has the associated coordinate system, which is built according to the topology of its backbone (atoms C, C α , N ) with the center at the position of the C α atom.R symbols denote amino acid residues.The spherical angles ϕ and θ of the neighboring residues are computed with respect to the local coordinate system of the central residue.(B) Graph corresponding to the Voronoi tessellation, v is the central node, u is its neighbor, x v and x u are the corresponding feature vectors, which are also shown with colored boxes.

Figure 2 :
Figure 2: (A) Examples of spherical filters learned by S-GCN of order 5 (top row) and S-GCN of order 10 (bottom row).The distance to the center is proportional to the absolute function value.The red color corresponds to the positive values, the blue color -to the negative ones.(B) Histograms comparing the ground-truth scores and the predictions of spherical graph convolutional networks on the CASP13 dataset.

Table 2 :
Comparison of S-GCN and S-GCN s with the baseline network and the state-of-the-art MQA methods on the unrefined CASP12 stage2 dataset.Parameters in parentheses correspond to the order of the spherical harmonic expansion.

Table 3 :
Comparison of S-GCN and S-GCN s with the baseline network and the state-of-the-art MQA methods on the unrefined CASP13 stage2 dataset.Parameters in parentheses correspond to the order of the spherical harmonic expansion.