A novel recommendation algorithm with knowledge graph

Personalized recommendation is an important topic in recommendation algorithm research. Traditional collaborative filtering recommendation algorithms suffer the sparseness and cold start problems. Existing knowledge graph based recommendation algorithms miss the high-order similarity at the subgraph level. This paper introduces a RNN-based distributed representation model of knowledge graph called KG-GRU, which uses a sequence containing nodes and relationships to model the subgraph similarity in the same embedded vector space. Then, a personalized recommendation algorithm based on knowledge graph called KG-GRU4Rec, is proposed. KG-GRU4Rec implements an end-to-end predictive user rating model. Experimental results demonstrate that KG-GRU learns more accurate knowledge about entities and relationships in the graph and KG-GRU4Rec outperforms the comparison algorithm in both hit ratio and average reciprocal ranking.


Introduction
There are several typical categories of personalized recommendation technologies: rule-based recommendation, collaborative filtering recommendation [1], content-based recommendation, social network-based recommendation technology [2] and knowledge graph [3] based recommendation technology.
Association rules are an unsupervised machine learning method that mines hidden associations in the target dataset to describe the rules and patterns of items. The main disadvantage is that it is difficult to evaluate the model and the rules. Collaborative filtering, while explores common interest among users, prefers to recommend popular items, resulting in a lack of novelty in recommendations.
In recent years, the recommendation system [4] emerges to rely on knowledge graph that integrates multi-source heterogeneous data to improve the effect of recommendation algorithms. However, traditional distributed representation methods for graph lack high-order similarities at the subgraph level when learning structure of the graph. This paper introduces a RNN-based distributed representation model for knowledge graph called KG-GRU, which uses multiple paths of entities and relationships to model subgraph similarity, and represent the relationship and entity in the same embedded vector space. Then, this paper proposes JUST, a jump or stay strategy, to guide random walks, avoiding manual construction of meta-paths. Finally, we present a novel recommendation algorithm based on knowledge graphs, KG-GRU4Rec, which is an end-to-end model that predicts user ratings, resolving the sparse and cold start issues.
The main contributions of this paper are double-folded: 1. A RNN-based distributed representation model for knowledge graph with a jump or stay strategy to guide random walks to sample the knowledge graphs.

Distributed representation of knowledge graph
2.1. KG-GRU model framework KG-GRU, a RNN-based distributed representation model for knowledge graph, consists of two modules: the subgraph extraction module and the RNN training module, as shown in Figure 1 and 2 respectively. Particularly, RNN is implemented with gated GRU [5]. The subgraph extraction module adopts the JUST strategy to generate single-hop or multiple-hop paths for target node pairs, which are then batched into the RNN network. KG-GRU optimizes the representation vector and the weight parameters by maximizing the subgraph similarity.
Let E = { 1 , 2 , … , } be the entity set, R = { 1 , 2 , … , } be the relation set and G = (E , ℇ , R) be the knowledge graph. ℇ is the set of edges. Denote |ℇ| = . Each edge ε ∈ ℇ corresponds to an element in the relation set R and g ≪ , ψ(ε) ∈ R, ψ(·) is the relational mapping function; each entity e ∈ E belongs to a certain ontology q ∈ , that is, ϕ(e) ∈ , ϕ(·) is an ontology mapping function.  The goal of KG-GRU is to map the knowledge graph into a low-dimensional vector space, while retaining the semantic and structural information of the graph, to obtain a vector representation of entities and relationships: → Ε ∈ ℝ ∀ i ∈ [ ] and → R ∈ ℝ ∀ j ∈ [ ]. On this basis, we can directly use representation vectors to compute the semantic similarity of entities in knowledge graph and apply them to downstream intelligent scenarios.

Knowledge subgraph extraction
Given two entities , ∈ , subgraph _ ∈ contains _ ∪ _ ，where _ = ( , , ) is a single triple or a one-hop path; = { 1 , 2 , … , } is a multi-hop path set ， = { , 1 , 1 , 2 , 2 , … , 0 }. If there is no path connecting and , then = ∅ . Assume the current node is , define ( ) = { |( , ) ∈ ℇ , ϕ( ) = ϕ( )} and ( ) = { |( , ) ∈ ℇ , ϕ( ) = }. The JUST strategy works through the following rules: (1) Jump or stay. The probability that JUST strategy chooses to stay in the same ontology is: where α ∈ [0 ,1] is the initial stay probability, l represents the number of entities continuously visited in the same ontology. If there is no entity that belongs to the same ontology connected to e i , 3 then, jump. If there is no entity of the same ontology, then, stay. Otherwise, JUST chooses to stay with the exponential decay probability α l and the next entity will be selected uniformly from E stay (e i ).
(2)If jump, which ontology to jump to. To balance the distribution of sampling nodes on different ontology, when selecting the next ontology q, we should exclude the m recently adopted ontology. Let the queue ℎist of length m hold the m most recently used ontology in a FIFO manner. The ontology q of the next entity will be uniformly selected from the following set:

KG-GRU training process
The KG-GRU embedding layer learns the distributed representation for each element in a path and the GRU hidden layer learns the representation of the entire path. Suppose KG-GRU is training the t-th element on path , the state and output of the hidden layer at step t are given by: where is representation vector of the current element , ℎ −1 represents the hidden state passed down from the previous element, which contains information about the previous path, and h represents the candidate hidden state of the current element. For a single path, GRU will output the representation vector of the entire path in the hidden layer of the last element. In , the similarity between the one-hop path and the multi-hop path can be obtained by the dot product between vectors: s( , ) = • Therefore, we can calculate the similarity between all multi-hop paths and one-hop paths in the set and store it in the set { 1 , 2 , … , }. To model the similarity of subgraph, we use the Log-Sum-Exp function to calculate the similarity between and : To optimize the similarity of subgraph, loss function of the subgraph is: where 1 represents the predefined interval value; [0 ,•] represents the hinge loss function; △ ′ is the negative sample set associated with (s, r, o), defined as follows: To improve training efficiency and learn the first-order similarity of entities and relationships, we utilizes the TransE  Figure 3 below, the input data is a set of multi-hop paths in knowledge graph, which are generated by the random walk guided via the JUST strategy; the output result is the user's prediction score. We treat the recommendation problem as a binary classification problem. Therefore, KG-GRU4Rec uses binary cross-entropy as the training function.

KG-GRU4Rec score prediction
While the attention mechanism [7] focuses the importance of different elements in one path, we use pooling operations to distinguish the importance of paths. Assume that final output of the path through the hidden layer of GRU is ℎ , the maximum pooling operation for the n paths is defined as follows: where ℎ[ ] is the element at the m-th dimension. Finally, KG-GRU4Rec uses fully connection layer to quantify the degree of association between user and item : where is the regression coefficient, is the paranoid term, and (•) is the sigmoid nonlinear function, which ensures that the model output control in the range of [0, 1].
JUST strategy mining associated path set ( , ) 11. Repeat the following steps until the model converges 12.
e ← ‖ ‖ , r ← ‖ ‖ //Normalized representation vector where ( ) represents the user's movie test set, n represents the total number of test users; ( , ) represents the ranking of positive sample in test set in the recommendation list.

Experimental results and analysis.
The baseline models selected in this section are: MostPopular, KPRN [9]. The average results of Top-10 recommendation tasks for each model are shown in the following table 1: MostPopular, as a non-personalized recommendation strategy, has worse performance than other algorithm models. KPRN, which calculates path score first, and then pools multiple scores with weights, is inferior to the KG-GRU4Rec model proposed in this paper. To distinguish the importance of different paths between users and movie nodes, KG-GRU4Rec uses two pooling operations on the path vector output by GRU: max-pooling and avg-pooling. Experimental results show that the average pooling strategy is more effective and more suitable for Top-N recommendation tasks based on implicit interactive data. This is consistent with our intuitive conjecture that user preferences for items are determined by a combination of factors.

Conclusion
This paper focuses on the personalized scheme based on the knowledge graph. By mining high-order similarity at the subgraph level, we present a RNN-based knowledge graph distributed representation model KG-GRU to sample knowledge graph using the JUST strategy. Based on these, we design a novel recommendation algorithm KG-GRU4Rec with additional diversities and interpretability. Experimental results demonstrate that it is superior to competing ones.