Paper The following article is Open access

Exploratory and Predictive Analytics of User Preferences from Kaggle LEGO-Toys Datasets Using Spark ML

, and

Published under licence by IOP Publishing Ltd
, , Citation Pritika Bahad et al 2021 IOP Conf. Ser.: Mater. Sci. Eng. 1099 012019 DOI 10.1088/1757-899X/1099/1/012019

1757-899X/1099/1/012019

Abstract

Apache Spark is an open-source distributed data processing framework. The paper presents a processing architecture for exploring and predicting user preferences using Apache Spark. The architecture is evaluated on LEGO-toys datasets of period 1949-2019 using the Spark Machine Learning (ML) algorithms. The large datasets analyzed consist of LEGO-toys parts, categories, themes and colour features. Spark ML algorithms are applied as (i) k-means analyses of clusters to identify commonalities in LEGO-toys themes and colours, (ii) classifications using the Support Vector Machines (SVMs), Naïve Bayes (NB) and Random Forest (RF) algorithms for theme-preference identification, and (iii) linear regression, decision tree regression, RF, and Gradient Boost for regression analyses to identify the colour-shift in user preferences. The paper elucidates the steps for analytics based on Spark. The results for exploratory and predictive analytics are presented. The evaluation metrics shows that the ensemble regression prediction is better when compared to other algorithms. The analytics give many interesting results. For example, LEGO company's products have become more colourful (children preferences exhibiting colours spectral-shift and width), diversified and multifaceted over-the-time. The architecture helps in discovering future directions for the new designs in future LEGO products. The proposed architecture can be successfully employed in the related domain to predict product and user's preferences.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.
10.1088/1757-899X/1099/1/012019