Telkom UData sentiment analysis using crowdsourcing and trust

Microblogging sites have millions of people sharing their thoughts daily because of its characteristic short and simple manner of expression. Sentiments analysis are often being used to analyse the user customer opinions regarding brand images or products. For some reasons, not all sentiment generated using this existing machine-based algorithms yields satisfying results. This is mostly due to the uniformity of the informal language used in the social media sentences. This condition also occurs in Telkom UData on our preliminary study, where the machine-based provided less then optimal results in analysing the sentiment. This research offers concepts with human interaction using crowdsourcing where people are involved to analyse sentiments, while forming the new training dataset at the same time. From the research results found that sarcastic and contradictory sentences can be recognized by humans, to be utilized as new training datasets for further machine learning. From this experiments, that approach are likely increase the accuracy of the sentiments in UData from neutral to become positive or negative polarized up to 39%. We do as well simulated trust concept through sociometric to ensure the crowdsource workers are trusted and capable enough in analysing the sentiments on social media.


Introduction
Since 1990s, the internet has been growing so far and becoming a powerful media to gain information from. Some of which is used to obtain the voice of customers by using microblogging channels known as social media. Customers now have enough trusted information from their global peers such that they can select the best product and the best service provider available to them every single time. Every single action on a social network is a proactive expression of what a customer sees, thinks and feels. Smart companies will embrace social support and social selling [1]. Referring to the statistic presents the social network penetration in Indonesia. As of the fourth quarter 2016, 40% of the total Indonesian population were active with social media users. The most popular social platforms are YouTube, Facebook, Tweeter and Instagram with more than 50% of penetration rate [2].
In Telecom industry, provider such as Telkom Indonesia is facing the period where competition leads to creativity, no longer to the price war. The most creative products will win the battle field. The creative products are the products that can fit customer's needs and attract their interest. The creativity should be achieved by looking for new product ideas and perceptions from Telkom product user: the customers.
With the rise of social networking era, there has been a surge of user generated content. Microblogging sites have millions of people sharing their thoughts daily because of its characteristic short and simple manner of expression. Sentiment analysis usually used as an integral part of social 2 1234567890 ''"" listening. Sentiments analysis are often being used to analyze the user customer opinions regarding brand images or products. The main problem in identifying sentiment in sentences, is that sometimes the meaning of opinions are not clearly implied. Despite the high usage of the sentiment analysis application, there are still rooms for improvements. One of the problems that still become a challenge in sentiment analysis is sarcasm, contradictory. Research has been conducted to identify sarcasm in sentences especially in Bahasa Indonesia, where their observation found that in Indonesian social media people tend to criticize something using sarcasm for certain topics [3]. The results of the study found that the additional features for recognizing sarcasm can increase the accuracy of up to 6%, by using negativity and interjection words. In spite of the accuracy did not increase rapidly, this research did provide encouraging results to reduce the inability of computers to recognize sentences that have double meaning.
Another way to solve the inability of computers to recognize those kind of sentences is by proposing human interaction into sentiments identification. Involving human works for online analyzing over large scale data is called crowdsourcing. Crowdsourcing is an online and distributed problem-solving and production model that has emerged in recent years. Crowdsourcing involves human interaction as volunteers or hired workers to find specific solution of a problem [4]. Notable examples of the model include Wikipedia iStockphoto, Inno-Centive, and Amazon's Mechanical Turk (Mturk) -an online market place for work that require human intelligence. While the latest in Indonesia is applied on www.kawalpemilu.org, which showed the power of netizen voluntarily providing their effort to validating the final results of Indonesian Presidential Election in 2014. Since the sentiments retrieved from the internet are large-scale data, human involved in this progress should be plenty [5].
Some researches has been done involving crowdsourcing approach. Barbier [6] stated that crowds of people can solve some problems faster than individuals or small groups. A crowd can also rapidly generate data about circumstances affecting the crowd itself. This crowdsourced data can be leveraged to benefit the crowd by providing information or solutions faster than traditional means. Other research such as Djelassi [7] focused on customer's participation in a product development process through crowdsourcing practices. Crowdsourcing generates a win-win relationship, creating value for both firms and customers. The results suggest the need to establish an open business model based on crowdsourcing. On the same way Mukherjee [8] used Amazon Mechanical Turk to crowdsource fake hotel reviews. Truthful reviews were obtained from the TripAdvisor Web site. They tried several classification approaches which have been used in related tasks such as genre identification, psycholinguistic deception detection, and text classification.
Telkom itself has started to utilize social media as a great resource for its customers to obtain useful information for the sustainability of their business. One of them is by providing free service on Telkom UData website at http://socmed.udata.id. This portal provides free facilities for public to obtain sentiments based on certain keywords that will be very useful to analyze a particular issue that is related especially to customer.
In this research, the sentiments gathered from UData will be used as pre-processing data for crowdsourcing purpose. There are need for some improvements in machine learning while improving the accuracy of NLP intelligence especially in Indonesian languange are still about to plan. With current conditions, some tweets containing sarcasm, and contradictions are not completely recognized. For instance when Telkom observes the sentiments as a product owner.
We are proposing crowdsourcing in this experiment since UData is a third party built-in-app and creating new training dataset is the only way to improve its perform. Another reason for proposing crowdsourcing is in order to find another alternative methods of sentiment analytics other than machinebase algorithm that has been done by many researches.
As an interesting example, when we enter the keyword "indihome" then we get the result of sentiment with the composition as shown by Figure 1, where the neutral percentage (contains no polarized sentiment) has the greatest composition 46%.  Then we look further particularly for those sentences polarized as neutral, for instance as shown in Figure 2 below. There comes curiosity where the sentence "Bisa mohon dibantu untuk proses berhenti berlangganan indihome via twitter?" or in English "Would you please help me to stop subscribing indihome via twitter?" should be negatively polarized, and UData did not recognize it appropriately. With Telkom's point of view as a product owner, this is a sample of losing customer loyalty which is not explicitly stated.

Figure 2. Sentence polarized as neutral
Regarding to the background stated above, we identify the research question as follows: 1. How to increase the accuracy information of UData, in this case sentiment analysis obtained from social network (Twitter), by involving human interaction via crowdsourcing approach. 2. How to identify trust of the crowdsource workers, in order to improve the trustworthiness of sentiments obtained.
The hypothesis of this research is by adapting crowdsourcing method in constructing the training dataset can improve the sentiment obtained from the machine-based algorithm. Words found in Twitter mostly used are unstructured and informal. The crowdsourcing result should produce more accurate sentiment of opinion from customers. The result can be useful for monitoring customer needs, analysis of future business growth, predicting the future business trends, as well as more accurate decision making, to get more fresh ideas or to find a new business opportunity. Moreover, we can also compute trust weight of each crowdsource workers to further strengthen the credence of the sentiment they propose.
As a delimitation this research limit the scope only for Telkom product (i.e. indihome), and the crowdworkers involved are only for Telkom employee. In this case all tweets obtained from twitter are filtered for sentences that has only keyword 'indihome' in them, to focus on sentiment of that specific Telkom product. Since the crowdworkers involved are restricted to only in Telkom Group, this crowdsourcing model is known as Enterprise Crowdsourcing.

Literature Review
With more and more social media emergence in this era, people are now faced with curiosity to know what is being discussed in social media in real time. Sentiment analysis plays an important role in mapping the growing issue among netizens, so there is a need to ensure that the sentiments produced by the machine are quite accurate. This section provide an overview of terminology, methods, and limitations related to the existing systems.

Sentiment Analysis on UData
Sentiment analysis, or sometimes called opinion mining, is used to extract and analyze the emotions conveyed in texts [9]. Numerous research from various domains takes advantage of the sentiment analysis; for instance, sentiment analysis is applied to detecting political stances, characterizing personality, measuring happiness, and even predicting stock market. With the growing use of the Worldwide Web, the demands for sentiment analysis are increasing. We use already collected data of tweets from Telkom UData that can be freely accessed at https://socmed.udata.id. UData is a social media analytics portal developed by Telkom and already has sentiment analytics in Bahasa Indonesia. The result of sentiments could be retrieved as positive, negative, or neutral labeled tweets.

Crowdsourcing
Crowdsourcing is a term, sometimes associated with Web 2.0 technologies, that describes outsourcing of tasks to a large often anonymous community [10]. It is an online, distributed problem-solving and production model that has emerged in recent years. Crowdsourcing involves human interaction as volunteers or hired workers to find specific solution of a problem. Notable examples of the model include Wikipedia iStockphoto, Inno-Centive, and Amazon's Mechanical Turk (Mturk). While the latest in Indonesia existed on www.kawalpemilu.org, which presented the power of netizen voluntarily providing their effort to validate the final result of the Indonesian Presidential Election in 2014.
Crowdsourcing has a great potential to verify the sentiment result by implicating human involvement. This would also be useful in validating the sentiment results whether actually accurate or not. In this research, we built Crowdsourcing mechanism to improve the sentiment of tweets to have better analyzing and more accurate sentiment. Every crowdsource workers (or crowdworkers) will be given some tweets they can verify. For each collected verification, the workers will be awarded points.
Crowdworkers consist of volunteers choosen from Telkom employees, based on consideration that they are familiar enough about Telkom product and will have better understanding on customer perception related to the product.

Trust
According to Kim et.al [11] we cannot judge human work only by the concept of quality, trust needs to complement it. Predictive evaluation of data based on trust is important, as if we cannot verify data we have to trust the provider. Understanding the interaction between quality of data and trust of source is most important. Incorporating more trust mechanisms, especially in crowdsourced practices will helpful to filter out low quality data. In recent trust and reputation concepts, there has been significant number of elements that have been researched. Such as dimensions like reliability, honesty assessment. However to be implemented in open community system like crowdsourcing there should be other parameter added to improve the quality of judgements and recommendations, such as anonymous detection, socio metric between agents, etc. Situm has already categorized existing algorithms used for computing trust among friends, Table 1 below represents a review of some of the currently available literature about existing approaches in trust computation on social networks or peer-to-peer networks [12]. This experiment is proposing the trust approach similar to Josang with domain in peer-to-peer interaction, graph based evaluation and have distrust/ less-trusted judgement, however we are using gradual representation in differ to accommodate the human interaction in crowdsourcing. According to Haydar [13], gradual representation of trust is more similar to the human way in expressing trust, whereas probabilistic representation is more meaningful mathematically.

Problem Identification
This reseach has identified potential improvement in identifying sentiment in sentences especially those in social-medias, by combining with crowdsourcing mechanism that can improve the sentiment analysis result. Commonly used methods of determining sentiments such as Naïve Bayes and Support Vector Machine (as well as those used in UData). The main problem in identifying sentiment in sentences with those methods is that some opinions are not only positive or negative. Despite the high usage of the sentiment analysis application, there are still rooms for improvements. One of the problems that still become a challenge in sentiment analysis is sarcasm and contradictory. So in this research another mechanism proposed to validate whether the sentiment are accurate, by involving human interaction and using crowdsource platform. The final result is expected to affirm more confident sentiments from the trusted crowdsource workers.

Figure 3. Model Design
In Figure 3, the basic step for design implementation to validate sentiment over human interaction are as follow.
1. The first objective of this research is to retrieve tweets with sentiments from online social media (Twitter). For this research the sentiments data is retrieve from Telkom UData. We use keyword "indihome" to limit the tweets data only for this specific Telkom product. The tweets already labeled with positive, negative and neutral tags by machine algorithm. This sentiments data then saved to local database, we call the sentiments gained from UData as pre-processing data. 2. For crowdsourcing mechanism, we focus on the neutral-labeled tweets because these are the tweets that cannot recognized by the machine so further analysis is needed by crowdsourcing. At this part, crowdsource work means labeling the neutral sentences that cannot recognized by machine (which may include slank, sarcasm, contradictory etc.). Each sentences will be labeled by at least 20 people [5]. 3. Parallel in sociometric session, each crowdsource worker will be judged for their trust by using sociometric experiment among them, by giving trust weight to other workers. This concept will determine which workers have the highest level of trust. 4. On the result & analysis stage, we compare the crowdsourced result (as post-processing data) and UData (as pre-processing data) to find out the increasing performance of sentiment analysis gained from crowdsourcing.

Data Collecting and Processing
The next step is data collection and processing which collect trust data from the crowdworkers by using sociometric experiment. The experiment is collecting trustworthiness each other among the crowdworkers. The collected data then run in to expected format dataset process to calculate the socio trust matrix.

Experiment Design
The initial step of the experiment is determined by choosing the appropriate workers to get the job done. As stated before this experiment uses convenience sampling. Convenience sampling is a statistical method by selecting people who are conveniently available to participate in study because of their availability or easy access. The crowdworkers involved in this experiment are chosen based on consideration as follows: 1. The crowdworkers are the Telkom employees with mostly educated as Bachelor and Master Degree, and should have job allocation related to customer based oriented. For example: Account Manager, Call Center Agent, Solution, Technician and Networking. 2. The crowdworkers can access the experiment platform only during free time on their working hours. Crowdworkers will be shown other participant names who has previously been defined initially from the database as member of crowdsource workers in this platform. They should give their level of confidence to each of these peers so that will construct the weight of socio metric for all crowdworker peers.

Weighting Model
Sociometric is a quantitative method for measuring social relationships. In this research sociometric is used to give ability to the crowdsource workers to rate for each other's trustworthiness. Every worker will be able to rate for other worker trustworthiness based on their social interaction. The more trustiness a worker obtains, will affect to the result of his validation of sentiment. The sample of the sociomatrix is as Table 2 which is constructed to clarify weights and introduce terminology to illustrate the strength of sociometry [15]. Table 2. The Sociomatrix

Sociometric Choice Matrix Chooser
Choosen

Data Source
As mention in section 2, the initial datasets are taken from portal Telkom UData. We also select tweets that posted on range of period between Jan 2017 -Apr 2017. The retrieved tweets are based on search with keyword "indihome" to limit and focus only analyzing to that kind of Telkom product. The data are downloadable from UData and already have prior sentiment labeled automatically using machinebased algorithm. From the collected data of total 3.951 tweets, we found about 41% (1.601 tweets) are labeled positive, 25% (1.004 tweets) are labeled negative while other 34% (1.346 tweets) are labeled neutral.
Each crowdworkers are pushed to analyze over 300 neutral tweets that presented one by one to the workers, as we choose with randomly unique tweets. This amount of tweets are based on calculation of sample size formula, by assuming there are 1.346 tweets, with confidence level of 95% and Margin of Error 5%.

Data Preparation
After having datasets collected, then the datasets are filtered manually to reduce non-related customerbased opinion, for example in this case tweets from telkom user itself (@telkomcare, @telkomsolution, @telkompromo, @indihomewitel, etc). Mostly tweets from those users are related to promotion or advertisement that will obscure the purpose of the experiment to perceive real customer opinion. Datasets that used for experiment are only the ones with neutral-labeled, by assumption those ones are the tweets cannot recognized well by the machine.

Profile of the Crowdworkers
Prior in the previous planning, the crowdworkers that will be involved in this experiment were about 24 workers, but only 21 workers are ready to join this experiment. This was due to the experiment was held on office hours, and the crowdworkers mostly were still doing their main duties. About 81% of the crowdworkers are male (17 workers), while the other 19% are female (4 workers). 52% of them are 20-30 years old, and 48% are 30-40 years old. None of them is above 40 years old. All of them are full time job employee of PT Telkom, with vary expertise such as Account Manager, Call Center, Networking, Solution and Technician. The assignment is done by considering that they are frequently get involved with customer, familiar enough in using social media, understand the terms commonly used by netizens, and also understand Telkom products especially Indihome.

Experiment Results
The experiment are started by giving the crowdworkers to login to the platform first. The system will log the time they start to record how long they use the crowdsource platform. After login the crowdworkers will be shown a sociometric form, each of them has to pass it by giving answer their trust on the other workers. Then the form continues with crowdsource platform. In this form, the crowdworkers are intended to analyze the sentence whether it is polarized positive or negative, while at the same time they have to tag keyword that polarize the sentence. The keyword found is also meant for the new training data set against data tweets are still neutral. From the proposed crowdsource results we have collected 110 keyword as new training dataset for UData. We also found some tweets (in Indonesian language) that contains anomaly which did not detected normally by machine based algorithm as in Table 3 and 4. Keyword is word tagged by crowdworkers in which according to their assumptions the word that make the sentence polarized positive or negative, and contra is the word found in the sentence that known generally has a normal meaning. For instance word "gangguan" or "trouble" in English, usually polarized as negative. However since the sentence also contain word "tahan" or "stand" in English, the crowdworkers decide the overall meaning of the sentence is positive. This is a contradictory example found, and cannot be recognized by UData.  After some adjustments and filtering works we finally have nice fact. The result generates more polarized sentiments than previous one. The table below shows us the results before and after the crowdsourcing process. There are a significant about 39% of neutral sentiment slightly changed to positive or negative polarized. As we found that most significant changes in sentiment found by crowdsource platform is because the lack of dataset for unknown keywords in UData. The crowdworkers have provided considerable input for the new training dataset and caused an enormous differentiation between UData result and proposed crowdsource result.

Figure 4. Paired Sample Test
In Paired Sample T-Test each subject or entity is measured twice, resulting in pairs of observations. As in this experiment where we have two observation sources, from UData (Pre-Processing) and Proposed Crowdsourcing (Post-Processing). The Paired Samples Correlation table adds the information that pre-process and post-process result are significantly positively correlated (.425).
From the Paired Sample T-Test result ( Figure 18) we found that the probability value Sig.2 (tailed) = .000 <  = 0.05, it means there are significant changes between pre and post process result. Our Hypothesis stated there are significant changes in sentiment of UData after the post-process result are acceptable.

Conclusion
Experiment shows the significant changes of UData performance improvement after the training dataset which derived from crowdsource collaboration. By using crowdsourcing approach, we can be sure that sarcastic and contradictory sentences can be recognized by humans, to be utilized as new training datasets for further machine learning. And from this experiments, that approach are likely increase the accuracy of the sentiments in UData from neutral to become positive or negative polarized up to 39%. Sociometric is used in this experiment to identify trustiness among the crowdsource workers, in order to make sure the workers are trusted for validating the sentiments. The index result obtained from historical sociometric can be useful later to determine which of crowdworkers are capable enough and trusted by workers to be assign for future crowdsource works.
For a recommendation, UData is a sentiment analytics for public and global platform, not devoted only for Enterprise Platform. There are some considerations whether it is applicable or not when we train its dataset using corporate point of view or product based perception. There should be further work to separate the function or privileged, so it can be utilized both for global and enterprise purpose.