A new algorithm based on bipartite graph networks for improving aggregate recommendation diversity

Most of the traditional recommendation algorithms focus on the accuracy of recommendation results; however, the diversity of recommendation results is also important, which can be used to avoid the long-tail phenomenon. In this paper, a new algorithm for improving aggregate recommendation diversity is proposed. Firstly, a candidate recommendation list based on predictive scores is constructed; and then a bipartite graph network model is constructed. Secondly, item capacity is set to limit the number of recommendations of popular items. Finally, the final recommendation result is generated by combining the recommendation augmenting path. Based on the real world movie rating datasets, experiment results show that the proposed algorithm can effectively guarantee the accuracy of the recommendation results as well as improved the aggregate diversity of the recommendation.


Introduction
Personalized recommendation technologies help users to obtain their preferred information from large amounts of data. It has become one of the most effective information filtering technology [1]. With the rapid development, personalized recommendation has been widely used in ecommerce recommendation, film and television recommendation, social recommendation, and so on [2,3]. In recent years, more and more researchers have realized that the accuracy of recommendation should not be the only standard to evaluate the quality of recommendation. The diversity, novelty, and the coverage of the recommendation result are also important to the evaluation of recommendation systems [4]. The importance of diversity in recommendation results has been highlighted in several studies [5,6]. Generally, the recommendation diversity includes personal diversity and aggregate diversity. Personal diversity is considered to be the view of a single user, which aims at recommending a series of items with low similarity to each other and satisfying the single user interests. Aggregate diversity is considered to be the view of the entire system, which aims at improving the capability of recommending those unpopular items in the system and weakening the long-tail phenomenon. A good recommendation system should take into account both the accuracy and diversity of recommendation. However, accuracy and diversity are mutually exclusive. Some scholars adopt a heuristic strategy to improve the recommendation diversity at the expense of a certain degree of accuracy. Adomavicius Get et al. [7] model the recommendation to the maximum bipartite graph matching problem, and improve the coverage of the recommendation items. However, the long-tail phenomenon has not been effectively solved. Adomavicius Get et al. [8] also proposed a re-ranking algorithm, which improves the aggregate diversity by improving the unpopular item ranking. This method has low computational complexity, but it lacks the global awareness. From the global perspective, Huiting Liu et al. [9] proposed a diversity optimization algorithm based on the recommendation expectation. The author improved the aggregate diversity of the recommendation by controlling the recommended expectations of all items. In this paper, aggregate diversity of recommendation systems is focused and an algorithm based on bipartite graph networks for recommendation aggregate diversity is proposed. Firstly, a threshold is set to filter the very unpopular item for each user. A candidate recommendation list based on predictive scores is constructed, and then the bipartite graph network model is constructed. Secondly, the item capacity is set to limit the number of recommendations of popular items. Finally, the final recommendation result is generated by combing the recommendation augmenting path. Based on the real world movie rating datasets, the proposed method is compared with the main algorithms, which focuses on improving recommendation aggregate diversity. This proposed algorithm has the following advantages: 1) it can optimize the aggregate diversity of most existing recommendation algorithms. 2) By setting the threshold, it can flexibly compromise between diversity and accuracy. Experiment results show that the proposed algorithm can effectively guarantee the accuracy of recommendation results as well as improving the aggregate diversity of the recommendation.

An Algorithm For Improving Recommendation Aggregate Diversity (Ad-Improved Algorithm)
The bipartite graph is a special model in graph theory, which is very important in the research field of complex network theory and practical application. An algorithm based on bipartite graph networks for improving aggregate recommendation diversity is proposed.

Related Definitions
Assume the set of user is defined as, U = {u 1 , u 2 , … , u n } and the set of item is defined as, I = {i 1 , i 2 , … , i m }. The user's rating of all items can be represented by a n × m dimensional matrix, R. R(u, i)represents the true score of the user u for item i. The predicted score of the user for the unrated item is represented by another n × m dimensional matrix, R ′ , where R ′ (u, i) represents the predicted score of the user u for item i. The predicted score can be calculated by user-based CF, item-based CF, matrix factorization CF and other existing algorithms. The user recommendation list is defined as Tu.The recommended list length is defined as N. The number of items that can be recommended for the item capacity is defined asIC. At the beginning of the recommendation, the item capacity of all items is the same.
Definition 1: Candidate recommendation list. Set the rating thresholdTr, and then the candidate recommendation list for user u is defined as,Cu = {item|item ∈ I, u ∈ U and R ′ (u, item) > }. The items in the user candidate list can be regarded as items that the user most prefers.
Definition 2: Recommendation bipartite graphs.G =< , , >, U is the set of user nodes, I is the set of item nodes, and E is the set of relations between user u and itemi. The connection between user u and the edge of an item in Cu is called matching edge, and the edge of the connection user u and the edge of the item in Tu is called the unmatching edge.
Definition 3: Recommendation augment path. The recommendation augment path is a set of edges, which start from a user node and follow by a unmatching edge. The item capacity of final item node is not 0. In Figure 1, assume that the solid line represents the matching edge and the dotted line represents the unmatching edge. u3 → i1,u1 → i2 are un-matching edges, and i1 → u1is the matching edge. Assuming that the item capacity of item i2 is not 0, then u3 → i1 → u1 → i2 is a recommendation augment path.

Main Idea of the Algorithm
The recommendation process is usually divided into two phases [10,11]. One is to predict the score of the user's unrated item, and then is to generate the Top-N recommendation list. Therefore, there are often two ways to improve the recommendation diversity. One is to increase the diversity by increasing intervention in predicting scoring. The other is to optimize the recommendation generation phase, and try to find an optimal recommendation combination. This paper adopts the second way. When the user recommendation list is not full, however, the items in the candidate recommendation list are up to the maximized recommendation times. If the item with the highestR ′ (u, i) in Cu is added to the userTu, it is helpful to improve the recommendation accuracy rate and not helpful to the recommendation aggregate diversity. If the system recommends an unpopular item, whose IC is large to the user, will conducive to the recommendation aggregate diversity but not conducive to the recommendation accuracy. The proposed algorithm searches the recommendation augmentation path from the user node and replaces the matching and un-matching edges in the augment path.
In Figure 1, for example, suppose two items are the recommendation to each user, and the item capacity IC is 2. At this point, user u3 recommendation list is not full, and the candidate recommendation list item i1 and i5 item IC value are 0. Our algorithm will find augment pathu3 → i1 → u1 → i2. I1 will be recommended to u3, i2 will be recommended to u1. It achieves the u3 user's recommendation. It also increases the recommendation of unpopular item i2. However, sinceR ′ (u1, i2) > , user u1 still have a high forecast score for item i2. Therefore, while increasing the aggregate diversity, our algorithm still has a good accuracy. In addition, the algorithm may be combined with any other recommendation algorithm with a score prediction step to improve its aggregate diversity recommendation.
The AD-Improved algorithm can be divided into the following processes: 1) Create a predictive scoring matrix; 2) Secondly, build a recommendation bipartite graph;3) Thirdly, generate recommendation results.

Create A Predictive Scoring Matrix
User-based CF algorithm, item-based CF algorithm, Matrix Factorization CF algorithm and other existing recommendation algorithm need predict the user's score on the unrated items. These algorithms select the items with highest score to build TOP-N recommendation list. Our algorithm uses the existing recommendation algorithm to generate the prediction score matrixR ′ .

Build Recommendation Bipartite Graph
The process of constructing bipartite graph is showed as follows. First, the score threshold Tr is set, and then the user candidate recommendation list Cu is obtained based on the predicted score matrixR ′ . Second, connect the user u with its items in the un-matching edges. The recommendation bipartite graph is the initial state of the system.

Generate Recommendation Results
The step of generating recommendations to users includes three cases, 1) In this case, IC of a user is not 0.In order to ensure the recommendation accuracy, the item i with the highest predicted score is selected. Item i is added to Tu and it is removed from Cu. At the same time, since item i has made a recommendation, it is necessary to update the item capacity of item i, IC i − 1. Finally, the edges between user u and item i in the bipartite graph are transformed from unmatching edges to matching edges.
2) When all the ICs in user Cu are 0, at least one recommendation augment path can be obtained from the starting point of the user node by using the breadth-first search strategy. It is recommended that the addition channel start at the user node and terminate at the item node. It is recommended to increase the p = v 1 → v 2 → ⋯ → v 2k−1 → v 2k. v 2k−1 for the user node, v 2k for the item node. Therefore v 2k−1 → v 2k is the non-matching edge, v 2k → v 2k+1 is the matching edge. The proposed multiplication pathv 2k−1 → v 2k is transformed into a matching edge, and v 2k → v 2k+1 is transformed into an unmatching edge. And the corresponding user's Tu and Cu are modified according to the edge, the item represented by v 2k is added to the user Tu of v 2k+1 , and the item represented by v 2k is removed from the user Cu of v 2k+1 . Finally, the item capacity of the item represented by the recommendation v2n is decreased by one operation, IC i − 1.
3) The item capacity in the user candidate recommendation list is 0, and there is no recommendation addition channel from the user node. At this point, it cannot improve the aggregate diversity of the system, in order to ensure the accuracy of the algorithm, the candidate recommendation list of the highest score items added to the user recommendation list. The item capacity value of the item is not changed.

Experiments
In this paper, we use the Movielens1M data set as test data (http://www.grouplens.org), which is widely used in the recommendation field. The proposed algorithm, noted as AD Improved algorithm, is compared with the Expectation algorithm [7]. The pre-processed data set contains 2830 users with a score of 775,176 for 1919 items. The sparseness of the pre-processed data is 14.27% and uses the most popular collaborative filtering techniques (user-based CF, Item-based CF, matrix factorization CF) to predict the user's rating on the unrated items to obtain a list of candidate recommendations. User-based CF, item-based CF, select 50 neighbours with the highest similarity as nearest neighbours. Matrix factorization CF, the normalization constant was 0.043, the latent factor was 15, and the number of iterations was 30 times using the 5-fold cross-derivative method. The results showed that 70% of the items were rated less than 50 times. For this data set, we set the item capacity IC as 50.

Evaluation Indicators
In this paper, the prediction accuracy is used as evaluation indicator. The algorithm performs different recommendation algorithms and cross validation tests on the Netflix data set and the Movie lens dataset. The prediction accuracy and averaging the correlation of the predicted scores are 0.974 and 0.966. So the average forecast score can be used to estimate the accuracy of recommendation, the recommendation accuracy rate is: The larger the coverage value, the more number of items covered by the system, and the aggregate diversity of the system is better. Gini-Diversity and Entropy measure the aggregate diversity from the quantitative view. The smaller the Gini-Diversity value, the better is the diversity. The greater the Entropy value, the better is the diversity. Where rec (i) stands for the number of times that item i is recommendation to different users. Figure 2 shows different scoring threshold, Tr=(3.5,3.6, ...,5), Our algorithm(noted as AD-Improved) and the Expectation algorithm(Expection) [10] based on user-based CF, item-based CF, matrix factorization CF are compared. Top-5 Recommended results are used for comparison under accuracy and diversity. As can be seen from Figure 2a, the algorithm based on Item-based CF can flexibly control the compromise between coverage and accuracy. When the threshold Tr is 4.5, the accuracy is reduced by no more than 5%. However, compared with the Expectation algorithm, the algorithm has a small advantage in terms of accuracy and coverage. As can be seen from Figure 2b, compared with the Expectation algorithm, the Gini coefficient of the algorithm based on the item-based CF is relatively small and the diversity has a slight advantage in the case of the same threshold. According to the accuracy, our algorithm performs well and the recommended accuracy is about 3% when the threshold value is Tr ∈ [4, 4.6]. It can be seen from Fig. 2c that the algorithm based on matrix factorization CF is superior to Expectation algorithm in accuracy and diversity. In order to better compare with the Expectation algorithm, while showing the relationship between accuracy and diversity. Table 1 shows the performance of the algorithm diversity through the adjustment of the accuracy rate (achieved by the scoring threshold adjustment) during the Top-5 recommendation.