MultAtt-RippleNet: Multi-attribute and Knowledge-based Attention Fusion Recommendation Model

RippleNet model is mainly through the joint training of knowledge graph and recommendation algorithm. In the recommendation system used this model, the effect is good but ignored the relevance of the semantics of each jump entity in the knowledge graph. The user of different degrees of relationship preference has not been taken into account, so a new model MultAtt-RippleNet is proposed. In the new model, we fuse the user’s preference weights for the relationship with the knowledge-aware weights to obtain more accurate and add realistic user preference weights to the ripple. The experimental results show that the highest performance of the MultAtt-Ripple model has increased to 92.67% for AUC and 85.90% for ACC, and the overall model has improved significantly, solving the problem that the generalization ability and accuracy of the model are not very high.


Introduction
With the development of information technology, the application field of recommendation systems [1] based on data analysis is expanding, such as shopping recommendations based on user information; music and movie recommendations for interaction history data, etc.The most troubling aspect of traditional recommendation models nowadays is the obvious problem of cold start [2] and a large amount of data sparsity [3], which leads to poor overall recommendation results.So we slowly put our research on knowledge graphs, using them as auxiliary knowledge to help make recommendations better and more accurate.
The Knowledge Graph (KG) [4] was completed by Google in 2012 as a search engine function for the Knowledge Graph.Therefore, the knowledge graph is used to introduce some auxiliary information to be used as input, such as contextual information, information about attributes of users or items, information about the association of movie attributes in the Ripplenet, etc.The semantic relations and auxiliary information [5] of the knowledge graph help to supplement the sparse interaction information of the recommendation system and solve the cold start problem to improve recommendation accuracy.
Personalized recommendations based on knowledge graphs have been used in scenarios such as movies and books.Currently, knowledge graph-based recommendation methods mainly include embedding methods based on representation learning [6] and meta-path methods.The MultAtt-Ripple model used in this paper is based on joint training of embeddings and paths [7], with the addition of knowledge-aware attention and self-attention [8] weights to improve accuracy.

Algorithm improvements
The RippleNet [9] model's portrayal of each user's preferences is derived from information in the user's preference data, and its per-hop ripple entity embedding of the user's historical clicks directly affects model performance.The key idea is to detect the user's preference for propagating, to use the user's historical interests as a seed set in KG, and to iteratively expand the user's interests and potential interests by linking layer upon layer of KG.The process is elaborated with U = {u1,u2,u3,...} and V = {v1,v2,v3,...} representing the user and item datasets, respectively, and X = { x uv | u ∈ U, v ∈ V} being the user-item interaction matrix [10], with x uv taking the value 1 if user u interacts with item v and 0 otherwise.The set of k-hop-related entities for user u is defined as: When k = 0, the seed set for each user u in KG represents this user's history of click items: We define the εuk-1 start knowledge triplet as the set of k-hop ripples for user u as follows: (3) The RippleNet model ignores the weight of the possible preferences of the neighbors around the entity in the ripple diagram of each user's historical clicks, so we add knowledge-aware attentiveness to obtain the weight of the neighbors around each click of the user's historical entities, which in turn directly affects the accuracy of the recommendation results.The new model, MultAtt-RippleNet, incorporates the user's preference weights for relationships in each hop of the knowledge graph, which again makes the model performance better.

Fusion knowledge perception
The new model MultAtt-RippleNet incorporates the newly added knowledge-aware attention, which is formed by embedding the head and relationship nodes of each Hop in the ripple formed by the entities clicked on in the user's history into vectors for the model to learn the feature vector relationships between them.Because each hop tail entity in ripple in KG has a different head entity and relationship, it naturally contains a vector representation of the potential semantic relationship formation within it.The newly used attention uses a five-layer structure, which is a fully connected layer, a ReLU layer, a fully connected layer, a ReLU layer, a fully connected layer, and a Sigmoid function.The weight of ripple per-hop tail entity knowledge attention in the (h, r, t) triple is p i hop : where p i hop means that each entity i inside a ripple hop is connected to the next hop entity weight, and hihop and r i hop are the embedding representations of the head, relation, and tail entities of each hop layer, respectively.In Formula (2), the parameter weights W and bias b of the trained neural network model are shown, where & is the meaning of stitching the two matrices together.
Formula (3) represents the probability of entities connecting to their surrounding neighbors in each hop of each user's historical click matrix, which naturally successfully incorporates the weight of knowledge perception.
Here, we are directly choosing to replace the h i r i vector connecting the surrounding entity neighbors with the knowledge-aware p i hop as a more accurate weight for the KG per-layer entity ripple propagation with per-layer hop-tail entity aggregation.This is because the former was experimentally found to be more efficient and accurate than the latter weights.

Linking user-relationship preferences
In the original RippleNet model, we are seriously missing the consideration of different relationships for each user, as different users like different movies and books with different emphases.Some users are more biased towards whether a movie has the same actors and actresses, such as whether it stars Liu Yifei, Yang Mi, or other high-value actresses, while others are more biased towards whether a movie or book is the same as their favorite genre, which determines whether the user will watch the movie or book.This should be an essential point to overlook, so the new model MultAtt-RippleNet in this paper links this important factor of user-relationship preference in the knowledge graph KG.
The MultAtt-RippleNet model takes the relationship of all entities in the (h, r, t) triad in each layer of hop in KG connecting surrounding neighbors with user preference weights.Without the user embedding vector, how to get the user with different relationship preference weights?In this paper, we use the idea of abstraction to portray the user's image, that is, we abstract the different history click matrices of each user as the characteristics of the user, so that we can represent the different users separately, and use this as the initial embedding encoding vector, which is called "user abstraction" in this paper as U abstract .Its significance is that we do not need to follow the rules to embed each user directly, as in "everything can be embedded", but the core idea of the original paper is to use other vectors to finally represent the user vector U, so there is no way to get a regular embedded user vector.So this paper breaks with the conventional approach and breaks through a major pattern in the general sense.
Each U abstract has a p j judging weight for the relationship, and each hop head entity is connected to the upper tail entity through the relationship considering both knowledge-aware p i weights and user-relationship preference weights p j , and each layer of hop entities p j hop is formulated as follows: ) ,r (u π p hop j abstract hop j   (7) where p j is the "user abstraction" preference score for the different relationships, expanded as follows:   represents the score of the "user abstraction" after the regularization of the relationship.The final formula for our "user abstraction" is as follows:

MultAtt-RippleNet
This thesis performs feature fusion based on the two innovations mentioned above, capturing features that are closer to authenticity and accuracy, and then performing the corresponding user vector to the vector of items to be predicted.The Pk formula obtained from the dual feature fusion is as follows: The | here stands for the fusion of two feature matrices stitched together.From this | can then obtain the following formula for the fusion probability of features for the user's knowledge perception-relationship preferences at each hop: The final part of the MultAtt-RippleNet model is the prediction part, where the final user embedding vector ureal is compared with the item embedding vector V to be predicted to obtain the click probability: The framework of this paper is shown in Figure 1, with three dashed boxes counting from left to right, the first and second representing the multi-attribute module and multi-attention module respectively, and the last one representing this iteration of the model for all head entities with hop equal to 2, the relational vector decomposition propagation process mechanism, and the next hop of 3 to N repeatedly.The multi-attention module adds up all the head entities with hop=0 in the Vu matrix of the user's history clicks, then after "TransformerEncode", adding the location encoding information, calculating the relevance after multi-attention, the residual connection normalization operation, the linear mapping activation function to get the representation of the hidden layer U abstract , and finally adding the pooling layer for dimensionality reduction processing, which can well prevent overfitting.

Datasets
In the MultAtt-RippleNet model in this paper, we will use the following publicly available datasets containing tens of thousands of real user ratings of movies MovieLens-1M and millions of records of book ratings Book-Crossing to build a knowledge graph.

Experimental Results
As shown in Table 3 below, the results of the experiments comparing the MultAtt-RippleNet model with other baseline models on both datasets achieve better results; Tables 4 and 5 compare with the RippleNet model, testing the effect of different sizes of ripple sets on the performance of the model under the two datasets respectively, and find that the new model at a ripple of 39 the best performance and outperformed the RippleNet model at all sizes.Figure 2 clearly shows the results of the experiments comparing the MultAtt-RippleNet model with other baseline models on different datasets using the precision@K and recall@K evaluation metrics.The results show that the new model introduces two knowledge-aware modules as well as a "user abstraction" module that models the relationship between individual preferences.These improvements significantly improve the recommendation performance, precision @K and recall @K.  2. The result of Precision@K and Recall@k in the top-K recommendation.

Conclusion
In this thesis, through the innovative improvement of the RippleNet model of KG where each layer of the hop head entity connects to the tail entity, when the ripples are passed to the next layer of the hop, the tail entity with a higher probability of relevance is judged based on knowledge-aware attention and the "user abstraction" of the relationship between each layer of the hop entities is preferred.The MultAttRippleNet model makes use of knowledge-aware attention, self-attention, and transformer-encoder, and effectively demonstrates that the trained model works better by comparing relevant data in the experimental part.
layer of the hop is then summed up in the new model to produce a more accurate, user-specific, and expressive vector representation of u real , as shown in the following formula:

Table 1 .
Basic statistics of two data sets.We divide the dataset into 6:2:2 training, validation, and test sets, execute each execution 5 times and then take the average.The final hyperparameters are set for both datasets as shown in Table2below, where d represents the dimension size of the embedding, represents the learning rate, and H represents the number of hops of ripple propagation, which may reduce performance as each propagation hop to the next level of visible relevance also decreases and increases the computational overhead by a significant amount.

Table 2 .
Hyperparameters set for each of the two data sets

Table 3 .
Results for AUC and ACC with algorithms for comparing different datasets.

Table 4 .
Comparison results of auc under different ripple in MovieLens-1M dataset. 6

Table 5 .
Comparison results of auc under different ripple in Book-Crossing dataset.