Depression Detection from Social Media Posts Using Multinomial Naive Theorem

All 350+ million people worldwide are suffering from mental disorder called Depression. An individual who is suffering from depression functions below average in life, is vulnerable to other diseases and in the worst-case, depression leads to suicide. There are many restraints preventing expert care from reaching people suffering from depression in time. Impediments such as social stigma associated with mental disorders, lack of trained health-care professionals and ignorance of the signs of depression owing to a lack of awareness of the disease. The World Health Organization (WHO) suggests that people who are depressed are regularly not correctly diagnosed and others s who are misdiagnosed are prescribed antidepressants. Thus, there is a strong need to automatically assess the risk of depression. Social media platforms increasingly come closer to become a true digitization of the human social experience. In many cases people would in fact prefer to express themselves online than offline. In this paper, we are using Facebook comments as data set, and based on it categorizing the users as depressed or non-depressed.


Introduction
Social media platforms increasingly come closer to become a true digitization of the human social experience. In many cases people would in fact prefer to express them-selves online than online [1]. A study by Marriott and Buchanan shows that there is no significant difference between an individual's online personality and online personality in terms of authenticity. In the aforementioned study an individual's online personality refers to the personality traits that can be inferred from their social media activity [2]. Whereas, the online personality is what people think they are as an individual self. Moreover, there is a tremendous growth in number of users for social media platforms. 81% of the entire population of the world is in possession of a social media as of the year 2017.
In this paper when we refer to depression mean Major Depressive Disorder (MDD) where an individual may be suffering from depression if they are expressing themselves by using the phrases that show signs of major irritation, anger expressions, loss of interest in hobbies, constant disappointment, suicidal thoughts, soreness or agony, headaches, cramps or digestive problems. [2,17] And usually this expression occurs constantly without a clear physical cause over a period of time. Personal or family history of depression, drastic change in life pattern, burden, certain physical illnesses and medications can be the risk factors or potential causes of depression. It is not uncommon for users on social media platforms to share major life changes, factors that caused stress and in fewer cases indicate that they are physically ill. To identify users showing signs of depression so that proper diagnosis and treatments may follow serves the primary goal of early detection of depression [19] Untreated Depression may lead to many undesirable defects like drifting towards drugs and alcohol, which in turn makes it difficult to recover from serious illness. The disease may prolong to years under clinical treatment. [3] Figure 1 shows the worldwide disease burden of MDD and dysthymia combined as of the year 2017. Dysthymia is a different depression disorder which has a prolonged impact but is less severe than MDD. The numbers are quite disturbing as there are approximately 322 million people suffering from depression disorders worldwide and this number is expected to grow larger by the year. There are many barriers that stand against proper health care.
Additionally, Figure 1 suggests that the larger fraction of people suffering from depression is from low-income countries so this affects likelihood of these people receiving expert care. Clearly there is an urgent need to tackle depression by identifying risks as soon as possible using social media platforms. Identifying depressed people online can pave the way for locating and helping people who need professional care but do not have access to them owing to the aforementioned barriers.

BACKGROUND
In the exploration given by S.Sridharan et al., design a model to detect depression by using Support Vector Machine (SVM), and Naïve Bayes. The Depression Prediction Model is developed using RapidMiner [2]. The model contains separate cycles to evaluate all classifiers, classifier SVM and classifier Naïve Bayes. Two datasets, and seven primary operators, constitute the model. The main idea of their research is to discover the connexion between the behaviours of SNS users and mental health diseases.
The crowdsourcing method is used by Munmun De Choudhury et al. to retrieve Twitter data. To predict the accuracy of depression, he built up an SVM classifier. The crawling technique is used by Keumhee Kang et al., The data is obtained from Twitter, and is processed using an open API in its data collection. Using keywords or continuous streaming, this crawl is completed. Maryam Mohammed Aldarwish and Hafiz Farooq Ahmed have developed a web application to categorise online media customers into one out of four levels of depression. They classify the levels as low, mild, moderate, and extreme. The data is obtained from Facebook and from Twitter. Survey BDI-II is ready. Yoshihiko et al., developed the "Utsureko" mobile phone application to collect customer data and use the intensity of Deep Learning to produce a model for the detection of depression, and the results show that their framework can anticipate high precision for severe depression using human history [15]. Individuals are linked with each other on social mediaIt builds a bond. Eric Gilbert et al., by using Firefox augmentation Grease monkey to collect the data and use LIWC for investigation, forecasts the Tie Intensity with web-based media. The character of a client or consumer is normally relevant today. The precise articulation in front of others reflects character. By their facebook status, remarks, photos, pages they liked, breaking down their actions, and so on, the personality of the client or user can be expected.
S. C. Guntuku, D. B. Yaden, M. L. Kern, L. H. Ungar, and J. C. Eichstaedtl. referenced in his investigation that utilizing Nighttime social media and emotion investment can influences the sleep quality and levels of Depression. Since by utilizing social media at evening may unfavorably influences rest in immaturity. S. C. Guntuku et al., discoveries reason that the utilization of social media particularly at sleep time is a central point that influences youth rest quality, and levels of nervousness, ie. More unfortunate rest quality and expanded nervousness and wretchedness. Kimberly A. Van Ordenetal., study suggested that ladies may presumably encounter numerous dangers that shows the presence of obstructed belongingness and perceived burden soreness.
Quan Hu et al., predicts the depression in clients or user utilizing the Sina Weibo information and it utilizes the Chinese software of text analysis "Wen Xin"for preparing the content. For Features choice they applied Greedy Stepwise algorithm and assemble a classification model utilizing Logistic Regression strategy and infer that it is pragmatic to predict a whether a user is in depression or not by ways of Social media.
Individuals with mental disorder regularly face nervousness issue which in the end forms into depression. Henceforth authors are keen on online networks for information. The data is crawled from several online communities of many users. Extraction of the psycho-linguistic posts based on topics is done to serve as input to model. Machine learning strategies are applied to create joint model for recognizing mental health related highlights. Machine learning and artificial intelligence can be applied to improvement of the general public. Many machine learning methods such as support vector machine, naïves bayes, decision tree, Knearest neighbor and logistic regression are proposed by numerous authors. Here the focused on gathering of individuals are understudies and working experts. Social Media is used to retrieve data from people.
In this study, the comments from Face book and twits from twitter are used to detect any factors that may show the signs of depression of relevant social media users. There are many machine learning techniques researched for this detection. Keeping in mind the key objective of this paper, we have pointed various challenges which have to be addressed in research. The following points represent the challenges in this area: Whenever a user express their feeling as a post or comments in the Facebook or tweeted in tweeter, it may sometimes refers to as emotional state like 'happiness' ,'anger','sadness','dullness' or 'excitement'. By use of effective machine learning classification techniques, we can analyze the data gathered through comments and can come to the conclusion regarding their various parts. After accessing the data, it is cleaned for any inconsistency. We have then analyzed the data by using coding techniques in python. The study focuses on various linguistic suggestions helpful in detecting emotion based events: the area of event caused. The analysis also focuses on the experience relative to the emotion keywords: the keywords can be any thoughtful process like positive feelings (e.g. 'enthusiastic', 'wonderful', 'affection' , 'fondness'), negative feelings (e.g. 'unworthy', 'defeated', 'discomfort', 'dirty', 'awful'), sorrow (e.g. 'misery', 'weeping', 'agony', 'heartache'), anger (e.g. 'stop', 'shit', 'hate', 'kill', 'annoyed'). It also incudes A momentarily process like focus on current time (e.g. 'today', 'is', 'at this moment'), focus on past time (e.g. 'previous', 'had', 'yesterday') and futuristic focus (e.g. 'shall', 'may', 'will', 'soon'). Grammatical parts of speech like articles ('a', 'an', 'the'), prepositions (like. 'into', 'above', 'to', 'behind', 'under'), verbs (like 'do', 'have', 'am', 'will'), conjunctions ('and', 'but', 'or'), pronoun (e.g. 'I', 'they', 'she', 'he', 'it', 'these', 'that'), and negation (like 'never', 'disagree', 'lying', 'no', 'not').This paper elaborates how we have used Facebook comments, and depending upon the type of behavior of comments the depression level is identified. We have followed 3 steps for achieving it [4].
First, the processing of the material is done based on various emotion detection techniques used to detect depression. Second, Categorization of the patterns into different features is performed for our specific research problem. Third, our experiments are carried out on datasets ofFacebook user comments as well as twitter's tweets which producing the output in the form of depressed and non-depressed with the intensity.

Methodology
The paper comprises of the features extracted from combined effect of the all the processes like emotional, momentarily, grammatical for detecting and processing the depressive data which is retrieved from Social media.
Each factor category is then applied with Supervised machine learning practices independently. Various classification techniques such as 'decision tree', 'k-Nearest Neighbor', 'Support Vector Machine', and Naves bayes algorithm can be used but we use here the later one for the implementation [6].
The following algorithm describes the flow of the work.
Step -1 -Retrieve the data sets (Twitter / Twitter API / Facebook / Facebook API Step -2 -Identify the pre-requisites to train the Naïve Bayesian Classifier. Step -3 -Compute the TDM (Term Document Matrix) for each class.
Step -4 -Convert the data set in frequency table using TF-IDF (term frequency-inverse document frequency-it will reflect how important a word is to a document in a collection.) Step -5 -Use the Naïve Bayesian rule for calculating depressed and non-depress words per comment / post. Step -6 Classification into depress comments and non-depressed comments from total comments / POSTS.

a) Data Analysis
The working is done by using Facebook users' comments as well as twitter comment. We have combined them and made a training set by using some data and rest as testing set the data is based on the features like emotional, grammatical and time based. Preparation is done by extracting the data using Facebook API and Twitter API. Then the processing is done independently with different feature types. One the data is prepared it is ready for further analysis and implementation.

Figure: 2 Research Methodology
We have used Facebook API and twitter API for collecting data from Facebook and Twitter respectively. APIs are the powerful tool in the world today for qualitative data analysis. It enables to arrange the data, break down it and discover knowledge from it. The data can be of any form. It can be open-ended survey responses, or social media post. It may be based on some interviews or articles. It can also be some web content [16]. The more the data is gathered the more accurate is the answer. A data set has to be chosen appropriately for the analysis to be accurate. Some points like the size of data, the correctness and integrity of the data plays a pivotal role in the research.

b) Dataset Preparation
The linguistic process implies the word count, word/sentence, favorable to a thing, individual pronoun, articles, relational words, helper action words, qualifiers, conjunctions, Negations, others punctuation-action words, modifiers, correlations, interrogatives, number, quantifiers.
Emotional process is defined as any process where turbulence in emotions process is absorbed or a strong feeling of emotions usually comprising psychological, cognitive, subjective, expressive processes. Everybody has emotional processing but they are not permanent in majority of cases and goes away with time. But some stay with this turbulence and that's where the focus should be as they need help [17,18]. If these data are processed properly it could help in signaling the depression patients.
We have taken the dataset of Facebook comments, 7146 comments. We have created a dictionary file with 8222 words of emotions. In the data dictionary we identified the total number of positive emotions, negative emotions and neutral emotions which are shown in the  Dictionary The sentiment dictionary is considered as target set or training set used to compare the words from Facebook Comment Dataset. Third task was to program the code such that it identifies the negative, positive and neutral words from the Facebook Comment Dataset by comparing them with our training dataset of Sentiment Dictionary. The coding is done for categorizing positive and negative words and create a list for the same. It is done in python by importing panda.
Extraction of the sentiment dictionary file and categorization of it into positive and negative emotions is followed by retrieval of the comments from Facebook dataset then matches it with Sentiment Dictionary and evaluates the total number of depressed and non-depressed indicative comments [7]. The degree of the depression is calculated based on the number of negative words that are matched. Here in outputsign indicates the depressive indication. The level of the depression is measured for eg. as -29 total depression indicative words are 29. Same is with positive words means that Facebook user is not at all depressed.

Fig 3: Bar Chart for The Depressed, Non-Depressed and Neutral Users Based on Facebook Comments
After categorizing the comments, the degree of depression and non-depression was found. Based on code 3 output scattered chart is prepared showing the degree of the same. From the chart we can conclude that comment indicating low depression is more, moderate is average and high depression count is less. We can start counselling with low and moderate depression indicating users and take an alarm call for the highly depression indicative users.  The whole concept of naive Bayes classification totally depends on Bayes' theorem. The probability of a particular object belonging to a particular when the feature values are given is called posterior probability. Bayes works on the assumptions that the samples in dataset are independent and identically distributed. The feature states that the random variables that are independent from one another can also draw from a similar probability [11] distribution [8]. When the probability of one observation does not affect the probability of another observation it is said to independent (e.g., time series and network graphs are not independent). P(p/wj) = P(p1//wj)*P(p2/wj) ….. P(pd/wj) The likeliness of pattern p given it belongs to class wi.
It can be stated as the summation of maximum likelihood of individual feature factor in the feature vector.

Prior Probabilities:
The estimation of posterior probabilities will be completed by the class-conditional probabilities. Here the evidence term of the priors follows a uniform distribution. [9] As we know that the evidence term is a constant, so final dependency will be on the class-conditional probabilities. So, either by estimating from training set or by conferring with the domain expert the priori knowledge can be extracted.

d) Multnomial Naïve Bayes Independence Implementation
In case of our research we will use Multinomial Naive Bayes Independence Assumption. Here we have to assume that position does not matter in bag of words here which have called sentiment bag. Other assumptions will be conditional assumptions. It states that the probabilities if the features are independent with the given class C. It is shown by the following equation. Above formula is used to calculate the number of occurrences of word wi amongst all words in particular topic cj in the given document.   4. It is unlike the Decision Trees which has a drawback of fragmentation specially with small data. This algorithm can be easily replaced Nave Bayes. 5. It is efficient if the assumption of independence holds on: that is if the assumed independence is correct, we can say that the Bayes Opt1mal Classifier is the best solution for the problem [12].

e) .Implementation for Term Frequency
We have used the following pseudocode to find the term frequency. Here the term frequency means the pattern or the word occurrence from a particular document. The word is matched the with training set. The occurrence of negative indicated words can help in detecting early depression.

Conclusion and Future Work
This research work is resulted in the categorization of depression level and degree of depression. Here in this paper training set is provided and social media post are bifurcated as per the training set. This can further be used to give alarm or notification to the relatives of user. Future work is the scope for developing the system which can send the degree of depression to medical team or relatives of patient.