Music genre influence and artist similarity based on data analysis

Based on related data and cluster analysis, this paper creates a mathematical model of similarity measurement based on the data set, and finally obtains the influence of different music genres on their followers and the similarity of artists of different musical genres. Here, since the study is about the similarity of music, it is divided into 2 categories according to the provided music characteristics and related indicators of music type. The data is standardized and normalized. Because there is no clear indicator to measure the similarity of music, we use spss to perform K-means clustering analysis. Here, since the similarity of music is studied, it is built on the music characteristics and music provided. Types of related indicators divide it into two categories for analysis. In order to determine the reliability of the model, this article uses meaningful learning to train 70% of the previous data, 30% to test, and finally establishes a complete mathematical model.


Introduction
Music has been an important part of cultural heritage of human society since ancient times. In order to understand the role of music in human collective experience, this article strives to develop a method to quantify the evolution of music. When artists create a new piece of music, there are many factors that can affect them, including their innate originality, current social or political events, access to different instruments or tools, or other personal experiences. The purpose of this article is to understand and measure the impact of previously produced music on contemporary music and music artists.

Musical similarity analysis
This paper establishes a model that can measure the music similarity. Since the average value of multiple indicators for each artist is known, 13 indicators are used as characteristic data through SPSS software: danceability, energy, valence, tempo, loudness, mode, key, acousticness, instrumentalness, liveness, speechiness, duration, popularity, Using K-means clustering to cluster artists, and divide them into two categories. Since it is necessary to discuss the degree of similarity between artists of various genres, match the genres of these artists and calculate the proportion of the genres in each category. [3][4] [5] In order to measure the music similarity, we selected the values of 13 categories of indicators for each artist, selected k-means clustering method, and clustered the data that had been cleared of outliers and standardized.

Cluster analysis
First, draw a gravel map of the characteristic values of these 13 types of indicators. That is when k=2, the D value is minimized, so divide into two categories.
The results of the clustering of the music of the 19 schools are analysed. As shown in the following table, due to space reasons, only some of the most influential music schools are shown:  For more precise quantitative analysis, calculate the Pearson correlation coefficient matrix between each index, and draw the heat map, as shown in the figure2. According to the calculation formula of the correlation coefficient, the correlation coefficient on the diagonal of the matrix is 1, so the lighter the color block of the sub-picture, the closer the correlation coefficient of the two music indicators is to 1, the stronger the correlation and the positive correlation; the deeper the color block, the closer the correlation coefficient between the two music indicators is to 0, the more irrelevant it is.
Most characteristics are related to others. According to the chart, the order of correlation degree from large to small is loudness and energy (0.8), acousticness and energy (0.79), acousticness and loudness (0.62), valence and danceability (0.58), loudness and popularity (0.48).
The following matches the genres of the two categories of artists to get the genre distribution of category 1 and category 2, as shown in the  As can be seen from the above table, in this model, the probability of artists in the same genre being classified into the same category is as high as 98%, and the average probability is about 76.17%. Therefore, according to our music similarity measurement model, it can be considered that artists of the same school are more similar than artists of different genres.

Test Model [2]
The deep learning architecture contains multiple layers of networks. Based on analyzing features,it can get more features, and improve the learning ability of the network. So the subterranean network can use a few neurons to fit the same function, which is more efficient and accurate in the learning process [1] . Therefore, the following results from this model are tested using deep learning.
Using a 4-layer model, 1 input layer, 2 hidden layers, and 1 output layer, the training model is as follows.
Fig3. the training model of neural network [3] Backpropagation is used to calculate the gradient of the loss function with respect to the parameters. Figure 4 shows the flow chart of forward and backward propagation:

Fig4.The flow chart of forward and backward propagation
Here, the learning rate is taken as 0.0075, and the results are as follows: the accuracy of the training set test is: 0.98; the accuracy of the test set test is: 0.97. So the model is relatively stable, and the result is more accurate and credible.

Conclusion
When performing data analysis, the use of neural networks can provide us with intuitive data relevance more clearly, use the cluster analysis to classify artists, and calculate the Pearson correlation coefficient matrix among the indicators for more accurate quantitative analysis. Using deep learning, you can obtain more features, improve the learning ability of the network, and be more efficient and accurate in the learning process.