Evaluation of social network user sentiments based on fuzzy sets

The article introduces social network user sentiment evaluation with proposed technique based on fuzzy sets. The advantage of proposed technique consists in ability to take into account user's influence as well as the fact that a user could be an author of several messages. Results presented in this paper can be used in mechanical engineering to analyze reviews on products as well as in robotics for developing user communication interface. The paper contains experimental data and shows the steps of sentiment value calculation of resulting messages on a certain topic. Application of proposed technique is demonstrated on experimental data from Twitter social network.


Introduction
Today it has become common practice for people to use social networks to express opinions on topics of interest or to write reviews on used products and services. This data allows solving such actual tasks as the identification of potential clients for a particular product or service, forecasting demands and customer expectations for a mechanical engineering product, as well as in robotics for developing user communication interface and public opinion evaluation [1].
Analysis of related scientific papers on the subject of automated assessment of social network user sentiments [2][3][4][5][6] reveals that techniques proposed there do not take into account the relationship between users and limitations associated with the processing characteristics of linguistic hedges frequently occurred in messages. For example, when a user is the author of several messages, according to [2][3][4][5] the assessment result depends on a total number of messages, regardless of how many users leave them.
In [7] we represented a method of sentiment analysis that takes into account users' influence within a social network as well as the effect of repetetive messages. The accuracy of the results that the method can provide depends on accuracy of message parsing. The object of this paper is to represent the algorithm of message parsing as well as the application of the method to experimental data from Twitter social network.

Theoretical analysis
In order to evaluate social network user's sentiments it's necessary to consider user's relevant messages on certain topic. Considering a sentence as a unit of the message, we assume that each sentence contains positive, negative or neutral sentiments regarding certain subject (product) characteristics K i from predefined set of characteristics K={K i }, i= [1…n], where n -the number of characteristics. Estimation of a product characteristic implies attributing it with one or several elements of a set of terms i-th characteristic. In order to assess a user's sentiments it is necessary to obtain the sentiment degree for each term T i,j , which can be represented as a membership functions of the linguistic variable. In order to generate membership functions of terms automatically, SentiWordNet 3.0 [8] program component was proposed [7]. In order to define membership functions of terms we introduce the universal set U = {P ("Positive"), O ("Objective"), N ("Negative")}. Assuming a term could not be positive, negative and objective simultaneously with a membership equals 1, we suggest a normalizing rule μ P + μ O + μ N = 1, where μ P , μ O , μ N are values of a membership function for elements P, O, N of the fuzzy set "Sentiment degree". Thus, each term can be defined by its membership function on U.
When a term is used with linguistic modifier, Zadeh's operations on fuzzy sets (concentration, dilation) should be used [9,10]. For instance, according to Zadeh, if a term is used with modifier "very" its membership function is obtained by membership function of atomic term raised to the square. However, since 0<= k <=1,  k 2 <= k , which conflicts with the idea that a term with "very" should have greater maximum membership value than atomic term. Similar reasoning is applied for terms with "somewhat" modifier.
We propose to normalize results of Zadeh's operations according to the following: to a fuzzy set of a term with a modifier, * ,k   is a normalized membership value;  signifies either concentration or dilation according to Zadeh. Figure 1 depicts membership functions for terms "good", "very good" without normalization and "very good" with normalization. 1, With that in mind, the membership functions of fuzzy sets, representing the terms of linguistic variable "Sentiment characteristic", should be similar to those depicted in figure 2. Another aspect that should be taken into account is user influence degree. As a measure of this characteristic we propose evaluating closeness centrality metric according to [11]. Considering social network users as experts on a certain subject [12] we could calculate the percentage of user whose assessments contain statements complying with term z of linguistic variable: where R i is i-th user influence degree, u -number of users, q -number of users commenting on the subject with attributives corresponded to z.
One of the most difficult tasks in evaluation of social network user sentiments is text processing. Text processing comprises two steps: splitting it into a set of sentences and separate parsing of each sentence. The result of sentence parsing is a word dependencies tree. Word dependencies tree allows finding a set of connected words related to specific feature. Text splitting and sentence processing are implemented in Stanford NLP library [13] which includes a set of classes and training data. Stanford NLP library parses a sentence into a set of pairs. Each pair contains relation name, master and dependent words. Every sentence has a root relation that includes only master word. Usually root relation is a sentence subject. Additionally the library assigns part of speech to each word. Based on parsing result it is possible to create a words tree. Mainly relations between words could be considered separately. For emotion evaluation "advmod" and "amod" relations are significant. "advmod" relation includes adjective and adverbial. "amod" relation contains noun which could be feature name or its synonym and related adjective.
Negation relation should be processed in a different way because negation propagation over sentence parts significantly affects word meanings. During dependency tree building it is possible to assign each node negation value and detect double negations. Primarily double negations occur in complex sentences. Stanford NLP library doesn't trace negation propagation. In our model negation value is assigned during word dependencies tree creation. Negation propagates from master node to all dependant nodes. In the figure 3 an example of dependency tree is represented. Using such tree as input data for sentence emotional analysis it is possible to find characteristic and terms and obtain a sentiment value for the sentence. We propose perfoming sentence emotional analysis in the following way:  Word tree containing negation values for each node is built.  The tree is traversed for noun nodes.  Every noun node is checked if it corresponds to analysed characteristics from K .  In case of correspondence dependent adjectives and adverbials (terms T i,j ) are processed. Figure 3. An example of word dependency tree for the sentence "Bob is not a very smart cat" "cop" -copula; "neg" -negation part; "amod" -adjective; "nsbj" -noun subject; "det"determiner; "advmod"adverb.

Experimental analysis
This section is devoted to experimental data analysis. Experiments based on the theoretical reasoning from the previous section were conducted. A software application was developed for this purposes. The application downloads a set of tweets related to specific topic and calculates SC. figure 4 shows component architecture of the software. Additionally, data about users' relations was analysed and social graphs were build. A social graph from a set of created graphs is shown on figure 5. If value of link weight is missing, it equials 1. Sentiment score of a message s could be calculated by a mock object or by automatic message content processing depending on component configuration (figire 4, SentimentAnalyzer component). The value generated by the mock object varies from -1 to 1. If a message is reposted, the original value of sentiment score is used. Table 1 presents input data containing messages from users 1-10. Closeness centrality and ss metric were calculated for every user because of the fact that one user could be an author of several messages. Table 1 shows the data based on which the social graph was built ( figure 5). Values of ss metric are used for evaluation of membership degree  to fuzzy terms of linguistic variable SC. Table 1 uses the following denotation: m i,j -j-message of i-user; rm i,j -message retweet m i,j .  Using data from Table 1, it is possible to calculate how users' sentiments are distributed among different SC. Figure 6 shows visual representation for data from Table 1. Table 1 uses the following denotation: m i,j -j-message of i-user; rm i,j -message retweet m i,j .
Results depicted in figure 6 allows for the conclusion that users' sentiments on the analyzed topic vary significantly: from negative value to a highly positive. Nevertheless, it is possible to infer that more than a half of users have positive attitude on the analyzed issue.
Further research should focus on developing faster and more accurate algorithms [14] for semantic processing of sentences and user influence evaluation provided that data is aggregated from various social networks. Data from various social networks could be obtained via Web Observatory infrastructure while sentiment analysis tool could be listed in Web Observatory [15] analytics and visualizations tools catalog. Challenging tasks are social relationships detection, unified social graph development and users' profile matching across different networks. "Analyzer" component interface should be modified to expose its functionality via standardized and documented REST interface allowing developers to create their own visualization tools (figure 4).

Conclusion
The algorithm of message parsing for social network user sentiments evaluation was described in the paper. Sentiment characteristic evaluation was performed on data obtained from Twitter. A series of experiments were conducted using a software application developed as a part of the research. Experimental results confirmed applicability of presented technique for analysis of users' emotions. Evaluation of social network user sentiments were obtained taking into account user influence within a social network as well as the effect of repetetive messages. Proposed technique could be applicable in the areas of potential customer targeting, demand prediction, evaluation of opinion ratings on a product or a service.