Classification of project management tool reviews using machine learning-based sentiment analysis

Managing the daily responsibilities in an organization is a great task for company administrators. Employing a dedicated project management tool is a great aid in all the phases of project management. With a variety of project management tools available in the market, it is a demanding task to look for a tool appropriate to the needs of the organization. It is an observation by scholars and suggests project management tool users tend to read first the reviews and comments of the product before deciding on the tool they are going to implement. This study facilitates sentiment analysis which can help identify effective and efficient project management tool. Microsoft Office Project is one of the most popular and most reviewed tools for more efficient project and portfolio management. The researchers manually extracted data from various websites containing comments and reviews from different users or reviewers. Machine learning-based-approach using RapidMiner was utilized to analyze the collected data from the web reviews. Sentiment analysis from MS Project reviews was applied using supervised learning K-nearest neighbor (KNN). The first level of classification involved classifying statements as “satisfied” and “dissatisfied.” The second level of classification involved clustering sentiments as cost, experience, task, support and interface. Statements classified as satisfied is greater than dissatisfied. The best features to apply when classifying the PM tool reviews dataset are using stopwords, length, stemming and trigram.


Introduction
Many information technology (IT) projects are commenced with bright concepts and enormous investments in various resources [1]. Moreover, IT project could be a challenging task which gears ITs, time-bound and within budget requirements, bound to reach a declared business profit and can be used in a fast changing and increasing range of tools used to hold out the management of operations in a company [2]. Project management (PM) is one of the fastest emerging disciplines in organizations today which is used in various purposes, like product development and/or improvement, system deployment, process building, process re-engineering, new service initiation, marketing campaigns and software development [3]. The implementation of project management throughout an organization supports in creating planned value chain which enables corporations a lead over their rivals [4]. Information technology project management has changed significantly over the previous decade from the traditional scheduling and resource management to a complete system that sustains the whole project programs, life-cycle of projects, and project portfolios [5]. As a result, many companies are reaping their rewards from their investments including: lesser budget requirements, better efficiencies, enhanced satisfaction among consumers and stakeholders, and an edge over their rival companies [4]. These days, there are numerous selection of project management suites offered in the market. However, despite the claims and implementations of PM software tools to project management of different organizations, most of the projects do not achieve much success [6]. The online product reviews are a strong influence of users' purchase and usage decisions because it helps lessen the doubts on the product quality and aligned to the requirements of users [7][8]. Many software users take advantage of the internet which can be accessed anywhere around the globe to read online software product reviews before they do buy the product.
MS Office Project, commonly known as MS Project, is a product that is developed and distributed by Microsoft. It is a range of applications that can make project management easier and well-organized. Thousands of reviews were posted online regarding the said software product. Reviews posted were of the different level of opinion and area of focus, hence, the conduct of this study. The researchers thought of conducting a sentiment analysis regarding MS project to identify further the strength and weaknesses regarding cost, experience, tasks, support, and interface of the PM tool in advantage to the project manager who wants to use the said PM tool.
Sentiment analysis (SA) is a tool used in analyzing peoples' sentiments, views, ideas, attitudes, among others towards different issues and matters of interest the company is concerned [9]. It is also known as opinion mining and extraction, sentiment extraction, effect analysis and emotion analysis. Furthermore, sentiment analysis is usually applied in advertising, movie reviews and other purposes which captures customers' feedback [10]. One of the three top approaches to perform sentiment analysis is the machine learning-based approach which requires a corpus containing tagged examples. The Knearest neighbour (KNN) was used in this study. KNN is easy to understand and implement. It is also scalable to new modifications as it is possible to eliminate many of the stored data objects, but still retain the classification accuracy of the KNN classifier [11].

Objectives of the study
This study was intended to classify sentiments about the MS Project PM tool. Specifically, this study aimed to provide analysis on the: 1. categories to be used in classifying the sentiments; 2. feature selection; 3. training and testing of the classifier; and 4. overall sentiment towards the PM tool.
1.2 Related Literature 1.2.1 Sentiment analysis Sentiment analysis is a task of natural language processing and information extraction that intends to acquire reviewers' comments and queries by evaluating a huge number of documents [12]. Sentiment analysis has gained numerous usage for the past years as internet usage and exchange of public opinion increases and becomes the driving force of this discipline [13]. Sentiment analysis is geared towards establishing common grounds on varying thoughts of people concerning a certain topic. People can post their sentiments through different media, such as product reviews, online forums, blogs and social media sites. Several software packages can be availed as a tool for sentiment analysis. Examples are General Architecture for Text Engineering (GATE), KH Coder, Coding Analysis Toolkit, TAMS, QDA Miner Lite, VisualText, Datumbox and the RapidMiner which was used in this study. Each of the said tools claims best features such as ease of use and topic clustering ability.

Supervised Classification Algorithms
To calculate discrete classes, an algorithm which 'learn' the patterns of data should be applied. This is where supervised classification algorithms comes in. They are valuable because of its flexibility in  [14]. K-nearest neighbor (KNN) supervised classification technique was tested in this study. [15] discussed that KNN is a type of instance-based learning or lazy learning which function is only approximated locally and all calculation is deferred in anticipation of the classification. It is non parametric technique used for classification or regression. It is called non parametric because it does not assume a functional form. In industries, however, KNN is widely used in classification problems. Generally, there are 3 important aspects to evaluate classification techniques which are: a) ease to interpret output, b) calculation time, and c) predictive power. KNN is used widely for its easy of interpretation and low calculation time.

Product Reviews
Nowadays, growing numbers of e-commerce platforms offer product reviews [16]. Product reviews are a great source of first hand experiences by users of a product. Satisfaction and dissatisfaction ratings enables consumers make wiser decision in purchasing the product. On the other hand, product owners can use the reviews to improve or enhance their product when the need arises.

Methodology
The study utilized the methodology as shown in Figure 1. The steps involved are data extraction from the internet, data cleaning, data pre-processing, feature selection and classification. Data extraction is the procedure of collecting data from a source. In this study, data extraction is performed manually. The following are three among the other web pages that were considered in gathering data:  https://www.getapp.com/project-management-planning-software/a/Microsoft-project/reviews/  https://www.capterra.com/p/1419/MS-Project/  http://technologyadvice.com/products/project-pro-reviews/ The data gathered from these three sites composed the majority of the data considered in the sentiment analysis. Overall, there were 1242 collected software product reviews. Prior to exporting to CSV Excel file, mined data was saved in local database. The 1242 reviews collected underwent data cleansing. The paragraphs were broken down into sentences since this study focuses on sentiment analysis at the sentence level. There were 1564 sentences produced and all data collected were English words. Non-English words were removed. An annotation scheme was developed for the analysis of the texts in the corpus. The annotator was tasked with the identification of text categories related to the reviews. The categories were presented later in this paper. Another category used by the annotator was to classify the reviews if it denotes satisfaction or dissatisfaction.
In this study, the software RapidMiner was used. RapidMiner [17] is a software tool for data preparation, deep learning, text mining, machine learning, and predictive analytics. Using the said software, the MS Project software product sentiment analysis run through this process. The operator 'Retrieve MSProject_Reviews' is utilized to read the database where the PM tool reviews dataset was saved. The operator's output is the metadata derived from transforming the loaded data by RapidMiner in the database. The 'Cross Validation' operator was used to train and test the different algorithms using 10 fold cross-validation. k-Nearest Neighbor (KNN) was the algorithm considered in training and testing the classifier.

Result and Discussion
The classification scheme used to categorize the data is composed of five categories as follows:  Cost -These are sentiments about the price of the MS Project were categorized here.  Experience -These are sentiments about the experience, occurrence, and familiarity of users towards MS Project  Tasks -These are sentiments regarding project management tasks, activities, and features.  Support -These are sentiments regarding online community support, manual and references.  Interface -These are sentiments of the users regarding the user interface of the tool. The categories were derived from the themes of the software product reviews. Table 1 shows examples of statements with annotation based on the annotation scheme used in this study.

Experience
It works well as a teaching tool because the students tend to understand quickly and easily.
If a group is not tech savvy, it takes some time in getting used to it.

Tasks
Fast data entry Formatting the header to accommodate various icons or logos is not easy Support Easy to find the correct help video Support is limited to forums and a 500$ per hour fee Interface I like that the interface allows me to enter the data about the project into what looks like a spreadsheet.
The interface is clunky at times and not always intuitive. Table 2 shows the data distribution as annotated based on the scheme. Of the 1564 sentiments considered in this study, 833 imposed satisfaction and 731 were found to be dissatisfied as perceived by the annotators. For feature selection, the algorithm was tested using a 10-fold cross-validation. Features were added after running the algorithm. Table 3 shows the result of the 10-fold cross validation without any features added. The result shows minimum accuracy and f scores.  A slight increase can be seen when features such as filter_stopwords, filter length, and stem were added. Further, when n-gram (3) was applied to the baseline, the result shows that there is an increase in all the f-score and accuracy. It is noteworthy that all the features in all their respective f scores conforms to the 10-fold cross-validation (see Table 2) except the "interface" category where dissatisfaction is higher than satisfaction. The same category gave the highest f score of 0.75 on "satisfaction" Table 4 shows the result of the classification using the K-NN algorithm. It shows the accuracy, precision, recall and F-measure score.

Conclusion
On this study, machine learning using RapidMiner was performed. The following are the conclusions derived from the experiment: The first level of classification involved classifying statements as "satisfied" and "dissatisfied." The second level of classification involved clustering sentiments as cost, experience, task, support and interface. Statements classified as "satisfied" is greater than "dissatisfied." Statements are mostly task related. In addition, "stopwords" removal, filtering of tokens length, stem and n-gram are the features used in this study. The best features to apply when classifying the PM tool reviews dataset are using "stopwords," length, stemming and trigram. The said features, when added to the classifier, increase the f-score and accuracy which means that the features enhanced the classifier. Even so, all values for accuracy obtained during testing were above 50%, which means a relatively accurate classification. The algorithm used manifested an accuracy of 70.83%.