Abstract
In this study, movie reviews are used as data sets to extract related phrases, topics, and sentiment scores from the text. Based on users' information, users' behavior preferences and their influences are analysed, and text semantic information is mined from multiple perspectives. A variety of data processing and machine learning methods including text segmentation, Apriori association rule mining algorithm, sentiment analysis, linear fitting, TFIDF algorithm, PCA dimensionality reduction, and LDA topic model is used in the research. At the same time, due to the coarse granularity of the topic extraction in the LDA algorithm, it is not suitable for short text, this paper proposes a new topic model based on improved k-means and TextRank and gets good results on this dataset. This paper uses multiple data mining models to analyse film reviews and presents an empirical study of the efficacy of machine learning techniques in text semantic mining.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.