A recommendation method of teaching resources based on similarity and ALS

This paper provides a collaborative filtering recommendation algorithm based on the combination of item similarity and alternating least square collaborative filtering based on the Spark platform. This is a recommended method to improve the efficiency of prediction calculations and reduce system response time. In order to solve the problem of model inaccuracy caused by the sparse data of the existing collaborative filtering recommendation scheme, which leads to the inaccuracy of recommending suitable online education and teaching resources to different users, the present invention uses the least squares collaborative filtering recommendation algorithm on the Spark big data analysis platform Optimize and use, and then use parallel methods to increase the amount of work completed per unit time and the accuracy of recommendations, and solve the problem of inaccurate recommendation of teaching resources.


Introduction
Under the epidemic situation, various education and teaching platforms have appeared one after another, and online education and teaching resources are also growing at a rapid rate. However, the sharp increase in the amount of education and teaching resources has brought many troubles and challenges to teaching organizers and learners. I have to spend a lot of time and energy to screen out educational and teaching resources that meet my needs. Therefore, the personalized recommendation system, which is widely used in the business field, has also begun to be gradually applied to the education and teaching field. It can use the user's historical behavior data to perform personalized calculations, discover the interests of different users, and guide users to gradually find their needs. Information and education and teaching resources, which greatly improves the work and learning efficiency of users.
Recommendation algorithm is a method that can filter information. It can effectively recommend personalized information according to users' information needs, personal interests, etc., and has been successfully applied in online videos, social platforms, online music, e-commerce, etc. field. With the continuous improvement of education and teaching resources in the construction of smart education and teaching, the use of rich education and teaching resources, such as e-books, teaching videos, documents, etc., to make personalized recommendations can help improve students' learning efficiency.
The most popular recommendation algorithm at present is Collaborative filtering (CF) [1]. Collaborative filtering, literally understood, includes two operations: collaboration and filtering. The socalled synergy is the use of group behavior to make recommendations. Biologically, there is a saying of co-evolution. Through synergy, the group gradually evolves to a better state. For recommendation systems, through the continuous synergy of users, the final recommendations to users will become more accurate. And filtering is to find (filter) the user's favorite plan from the feasible decision (recommendation) plan. Specifically, the idea of collaborative filtering is to find a certain similarity (the similarity between users or the similarity between the subject matter) through group behavior, and make decisions and recommendations for users through the similarity. Generally speaking, collaborative filtering recommendations are divided into three types. The first is user-based collaborative filtering [2], the second is item-based collaborative filtering [3], and the third is model-based collaborative filtering. User-based collaborative filtering mainly considers the similarity between users and users. As long as you find out the educational and teaching resources that similar users like, and predict the target user's rating of the corresponding educational and teaching resources, you can find several education and teaching resources with the highest ratings are recommended to users. And item-based collaborative filtering is similar to user-based collaborative filtering, but at this time we turn to finding the similarity between educational teaching resources and educational teaching resources, only to find the target user's preference for certain educational teaching resources Then we can predict similar educational and teaching resources with high similarity, and recommend several similar educational and teaching resources with the highest scores to users. For example, if you buy a machine learning-related book on the Internet, the website will immediately recommend a bunch of machine learning and big data-related books to you. This obviously uses the idea of project-based collaborative filtering.
However, the accuracy of many recommendation algorithms is low, and the direct use of collaborative filtering methods will cause the recommended education and teaching resources to be out of date due to lack of understanding and understanding of the course content, and the recommendation results are less relevant to the prepared course tasks. Therefore, it is necessary to construct a personalized recommendation method that recommends relevant education and teaching resources as accurately as possible.

Introduction to the Spark platform
Spark is an open source distributed cloud computing platform with file system, database, data processing system, machine learning library, and allows developers to store data in memory [4]. It has an advanced directed acyclic graph (DAG) to run the engine to support data flow and memory calculation, so the program running speed is greatly improved. Spark also provides efficient data manipulation, making the development of parallel applications very simple.
Spark has two abstractions, the first is RDD (Resilient Distributed Dataset), and the other is shared variables. RDD is one of Spark's core concepts. It represents records, which is a certain type of object. These objects will be distributed to different nodes in the same cluster. When a node or task fails, the RDD in Spark is fault-tolerant. The fault-tolerance is caused by different networks, hardware failures, etc., rather than code errors. Shared variables are variables that all nodes can share. The reason for the existence of this kind of variable is that when parallel, the function of all variables in each node makes the copy of the variable by the node itself does not affect the change of the variable of another node [5].
Spark's parallel and fast machine learning engine-MLlib (Machine Learning lib, machine learning library), including commonly used machine learning algorithms, and related data generators and tests. The performance of many iterative machine learning algorithms on Spark is very good. Currently, the main machine learning algorithms of Spark's MLlib include collaborative filtering, regression, clustering and classification. Personalized recommendation engines often use collaborative filtering recommendation algorithms because they can infer items that users might like. Today, MLlib only supports model-based collaborative filtering recommendation ALS. Its idea is to reduce the size of the matrix through matrix decomposition, taking into account the hidden factors between the product and the user. These factors help to more accurately infer user preferences [6].

Item-based Collaborative Filtering
The recommendation idea of Item-based collaborative filtering is to recommend items similar to their favorite items to target users. The similarity of items is based on the simultaneous appearance of items in user behavior records.
The ItemCF algorithm is mainly divided into two steps: Step1: Calculate the similarity between items.
Step2: Generate a recommendation list of items that are highly similar to the items that the target user likes for the target user. The solid line indicates the existing user behavior records. In Fig. 1., User1 and User3 both prefer ItemA and ItemC, so ItemA and ItemC have a high similarity, and ItemC can be recommended to User2 who is interested in ItemA.

Alternating Least Squares
Alternating Least Squares [7] observes the user's rating of the item through matrix decomposition. Based on the rating matrix is a low-rank matrix, the main idea is to find two low-dimensional matrices ( × ) and matrix ( × ) to approximate ( × ) . Fig. 2. is the R matrix decomposition model diagram, which explains how to decompose the R matrix.

Figure 2. R matrix decomposition model diagram
The specific solution method of Alternating Least Squares is an alternate iterative calculation process. First, fix one of the matrices such as V, take the partial derivative of the loss function ( , ) with respect to , and set the derivative equal to 0 to obtain: Fixing U in the same way, we can get: Repeat the alternate calculation of and to update the matrices U and V until the root mean square error changes little or reaches the maximum number of iterations [8], and = is obtained.

Item Similarity and Alternating Least Squares
In order to combine the advantages of the project-based similarity and the alternating least squares algorithm and make up for the shortcomings of the two, in this article, the project similarity is combined with the ALS algorithm. The flowchart is shown in Fig. 3.  Step 1: First, obtain the user's rating matrix through the raw data of the user's rating, and calculate the similarity between the user and the user, the educational teaching resource and the educational teaching resource. The present invention uses the vector cosine method, also called VSS (spatial similarity method) [9], where N(m) represents the set of educational and teaching resources owned by user m, that is, the promotion of educational and teaching resources by user m, N(n) Indicates the educational and teaching resources owned by user n, and the similarity formula between users m and n is: It can be known from the same principle that the similarity formula between educational teaching resource i and educational teaching resource j is: