 Hi, I'm Mimamsa and I'm presently doing my undergraduate and from the past three years my research topic has always revolved around sentiment analysis and its application in various domains. Last year I finally settled upon using sentiment analysis in healthcare, especially to psychological issues and today I would not be going into the coding techniques, but mostly an overview about what are the basic criteria of sentiment analysis and what are the basic NLP techniques and what are the basic packages in Python libraries you can use to perform those tasks. This is the outline of the talks. So it will start with introduction and then NLTK and the sentiment labeling tasks and the use of sentiment analysis and sentiment in healthcare. So what is natural language processing? It's basically the requirement that you read something, but if you are a one-year-old child you cannot recognize what you are reading, but if you are say a 10-year-old child you can recognize much more of what you are reading. So a computer has no intelligence as we have always known. So how to make it learn to read natural language and to summarize the natural language basically. So how can it know the grammar, how can it know the context, how can it understand what is basically supposed to say, what is the language supposed to say? So why do we actually process natural language using computers if human can process it that well? So that task is basically for question and answering systems as well as you need to convert a human interface like if there's a Facebook post that is targeted at you towards you so that post needs to know what your reactions are and if there are millions of people on that social media website there's a very less chance that every post would be read, every comment would be read by the creator of that website. So he needs to know the general opinion of the inclination towards the project towards the ad etc. So that's why we go with natural language processing. Now what is sentiment analysis? So sentiment analysis is basically identifying and like characterizing a basic unit of text. Sentiment analysis initially was just towards positive or negative or neutral but these days it is combined with emotional analysis so it aims to refer to whether you are angry or whether you are sad or whether you are happy through your text and post and Twitter data and comments as well. So what can a sentiment analyst answer? So say there's a movie that has been released recently and there's a huge uproar on Twitter about it so I can mine the Twitter data and understand whether the sentiment towards the movie is basically positive or negative or what is it positive about? Say the graphics like if you write on Twitter that the graphics are good but the storyline is so weak. So I wouldn't go through each post on Twitter as a movie maker to know whether what was good or what was weak. So we need the computer to state that okay most of the people say that the graphics was good so we didn't need to improve on that but most of the people say that the storyline was weak and that is the main point of improvement and that is what is called as topic modeling. So this is basically also used in project reviews, movie reviews, ad targets, social media marketing etc. This is the core, actually the primary application of sentiment analysis. So anyone who starts with sentiment analysis basically uses a movie data set and tries to capture that okay this is the text which is in the data set can predict the ratings that the user wanted to give to this movie. What isn't easy about natural language processing as compared to computer vision is basically the inaccuracy of human perception. So given an image of a cat if I show it to say hundreds of people I have a 99.99% chance that everyone would say that it's a cat. Given a piece of text which has around 10 to 20 lines in a text say if it's a novel if I show it to some literary authors or people who have a good background in English literature they would find metaphors in it which delve into deep context whereas if I show it to some other person he might take it on his face value. So the problem with natural language processing tasks or implementations is basically that every task we have the human accuracy doesn't go over 80%. So if I have a tweet and if I ask hundreds of people whether this is positive or negative humans themselves don't tend to agree 20% of the time. So that is a basic problem for a machine how can I feed a data set to a machine where the data set is wrong 20% of the time itself. So this is a very common Kelvin Hopes comment it is about Hopes asking Kelvin whether he likes a new goal in the class and Kelvin actually answers in negative but I'm pretty sure you can understand that the answer is not negative he does actually like the goal in the class. Now if I feed this conversation into a computer it will probably result that Kelvin is not interested in the goal but whereas we need to know that okay this responses that have been articulated by Kelvin are actually sarcastic in nature rather than being definitive in nature. So what are the challenges the challenges are basically like sarcasm detection metaphor detection and one of the most challenging tasks in data cleaning related to sentiment analysis is text speak. When Twitter creates a text limit of 140 characters we don't tend to shorten our argument we tend to shorten the words we write in the argument. So if I want to write 200 character post I wouldn't find another worse word I would rather write letters mixed with numbers which might mean something to someone who knows the SMS language but to a computer who only understands dictionary and text it would actually mean nothing it would be a word which would meaning nothing because it wouldn't correspond to any word in the dictionary. So how do we do that and how do we process simple English text in Python the basic module for processing text in Python is NLTK. So it provides an easy to use interface with copper and lexical resources it also provides some basic classification and it provides a very good implementation of tokenization and stemming tagging from the Stanford parcel. So there are many copies in NLTK one of the copies being the movie review copies one of the copies being the brown copies and you can directly use those copies by say import NLTK NLTK download and when you do NLTK download it opens up a GUI interface where you can click the checkboxes to download the copies you want and then like in this example if you download the NLTK copies brown you can actually see the words directly from the copies and perform analysis on it rather than cleaning the data set yourself for the big like for the initial task. So one of the simple things that you can do with NLTK one of them is POS tagging so POS tagging refers to part of speech tagging. Why is it necessary so when it is related to sentiment analysis most of the sentiment in a sentence is in the adjective of the statement so he's a bad person he's a good person I'm having a great time the identifier which actually defines whether it's a good statement a positive reaction or negative reaction is in the adjective of that statement. So POS tagging basically helps us to define the sentence structure and the context so if I'm putting in a statement and with NLTK you can directly use word tokenize function and put in a sentence and it would return you the words with their corresponding part of speech tags from these words you can directly use the adjectives you can also later we will see how we can use the conjunctions to define rules whether which word is important and which word is not so conjunctions like the service was terrible but the rooms were clean has if I directly take the words terrible and clean into consideration and add them up it would read to lead to a neutral reaction but what it is basically need saying that the first statement is a negative one and after but the second part of the statement is a positive one so how do we do that to find the conjunctions we also use POS tagging it can display pastries directly and one of the basic purposes of using NLTK is also stemming so how do you convert plurals to singular so like if we end we can import a stemmer and then like I'm putting a list into of plurals and say flies is plural and how to convert it into fly or and say boys is a plural how to converge into boy why is it necessary is basically for finding out what the person is speaking about so say the person is speaking about graphics in a movie review and he says the graphics were good where graphics is a plural and other person is saying that the graphic integration was good where graphic is singular but both of them refer to the same thing but if we use any like any model directly it would consider graphics and graphic to be two different words and apply different rules on them so we first need to convert all plurals to singular to manage them in one context itself then there is something called as named entity recognition which is basically used for deciding whether a particular phrase or a word is a proper noun or a general noun so proper nouns are basically names or places or say monuments or things like that so named entity recognition is necessary for you can say it's necessary for creating an identification about what the person is speaking so if a tweet is about a politician and we have performed named entity recognition so we can know that okay if this has a tag or say Hillary Clinton then the named entity recognition would recognize that as an object and we'll know that the whole tweet is about that person or say that there is a mention of Singapore in a tweet and the named entity recognition would identify Singapore as an entity and then we'll know that the whole tweet talks about Singapore so what are feature vectors so feature vectors are basically integration of different words in a document into one linear format or vectorized format so we can develop a model and input it as training vectors so there's a very common feature vector called as bag of words which basically which basically considers each word as its own entity and counts the occurrence of words as a metric as a metric for how important it is so say a document has good three times it would be more positive than a document that has good two times and that would be more positive than a document that has good one time but bag of words doesn't take context into consideration so say I'm writing a review about a mobile which says the mobile is good but expensive and say I'm take writing a review but another mobile which is the mobile is expensive but really good bag of words would consider each word on its own but the second review is more positive than the first review it's saying that though it's expensive it's worth the expense the first review is saying that okay it's good but I don't think it's worth the expense so bag of words doesn't take context into into example or into consideration so what do we do to take context into consideration we generally use n-dimensional vectors which are usually deployed using word-to-veg or glowy now word-to-veg is a deep learning framework developed by google which basically considers analogical relations and semantic relation between words to give output so you would have probably heard of an example from google using word-to-veg as something which is related to king and is not a man so what does like if we have man to woman and if we have king to like we did analogies in for mental test like king like if we have man to woman and king to we don't know what it is king to question mark so word-to-veg can basically fill in the gap and say that it would be a queen so if we put an input query into word-to-veg as something which is a king and but it's not a woman it would directly deduce it to be a it's not a man it would reduce it to be a queen so what word-to-veg does is it it creates a 300 dimensional vector of each of the word and its importance and it takes context into preference so we can directly add or multiply those vectors to know that the relation of words and the placement of words is important we can use similarity matrix using different composition of words like taking two words into consideration or three words into consideration and those are called as unigrams by grams and in grams what are the different dictionaries and modules available for sentiment analysis so the three major ones are centic net which is basically a concept mapping of whether a concept is positive or negative like and it is dependent upon when it is said so say go to bed is a concept so if I say that as a response to I don't want to go to the party today I want to go to bed it's probably a negative response but I'm feeling fresh today I went to bed early yesterday it's probably a positive response so centic net takes that information into consideration and and displays the polarity and the subjectivity of that specific concept then there are other basic dictionaries that are usually used one is appen and one is centi wordnet which centi wordnet is created using twitter and has polarity decided by the tweets whether that word is positive or negative some of the major modules in python related to sentiment analysis are tech is text blob and it will take it text blob is a module which is useful for people who just want to plug in their sentences and don't want to build a model around sentiment analysis but actually want to perform sentiment analysis on a small scale rated to the probably the startup company or the reviews they're getting or the feedback they're getting on their applications so text blob is an interface which implements a nave-based classifier inside of the module and you just applied with a sentence and it would respond to you with two values one of them being polarity and the other one of them being subjectivity so the polarity basically means if the value is less than zero the sentiment is negative if it is greater than zero the sentiment is positive subjectivity basically means that whether the person thinks is writing their opinion or is writing a fact so say um say there's a movie say there's an app app review which you have posted on android and the there's a sentence which says uh the application is really slow and is hard to understand this might this would probably be more objective rather than a review which says that the application needs to have features like the if like these and these to make something more comfortable so this is a person expressing their own opinion whereas the first one is a fact about the application so objectivity or subjectivity basically decides how subjective or objective the the sentiment of a sentence is so um sentiment labeling and classification methods are usually of two types one is binary labeled and one is multi-labeled binary labeled is like you have buckets you have multi you have usually have three buckets positive neutral and negative and then you have a piece of text and you want to put them into those buckets so you want to decide okay uh this is positive this is negative this is neutral and you don't you don't want to put them into two buckets at once you just have to decide okay you have to perform an analysis report so the report would say 70 percent of them were positive and 20 percent were negative and 10 percent didn't say anything substantial whereas multi-labeled data basically means that you assign probabilities to each one of them being something so say when you classify them into buckets it's usually like if i'm classifying them into buckets and it's positive the the trained data says that it's so the probability of it being positive is 0.6 so we'll put it into the positive bucket but we don't know whether it's it's sufficiently large enough than like say 0.4 so that we can discern it as negative so what basically multi-labeled data does is now they attach multiple tags to the same piece of text so if there's a news report and it can be both angry and sad at the same time say for example an assassination of a for for celebrity can be angry and sad at the same time we don't have to put into bucket of okay this text is either angry or this text is either sad the major classification methods that are successful at a preliminary level are bag of words and k nearest neighbors so in k nearest neighbors you cluster the information the training dataset into different parts and then you find the minimum distance of the test dataset from which from which cluster it is minimum and then assign it to that cluster itself so if we have to build a basic neighbors class sentiment analyzer an lgk provides corpus called as movie reviews so what we basically do is for the first time when we are trying to implement sentiment analysis and trying to see what results it can produce we can probably create a basic sentiment analyzer which gives around 72 percent of accuracy in around 15 lines or so so what we do is we import the classify module from an lgk and we import the movie reviews corpus now there's a function called as word feeds which is basically word features so what it does is we pass it a list of words and whether that what it does is basically it creates a dictionary of words where it it may it maps that if a word is present it's true if a word is not present it's false so what it does is for each word in the list it creates a dictionary of word to true or and the rest would be false so now what we do is we the movie reviews have a column called as negative and a column has a column called as file id's and when then we extract negative from it and positive from it so negative id's are all the id's of the movies which have negative reviews positive id's are all the id's of the movies which have positive reviews now what we do is we create the feature the bag of word feature from the previous function you and we may create it on the movie reviews words in negative id's and in the positive id's so what negative feature has is word features of movie reviews whose whose judgment was negative and what positive features have is a word feature of movie reviews whose judgment was positive now what we do is the negative cutoff and positive cutoff is basically the cross validation for purpose of cross validation so what we do is we consider 75 percent to be a trained dataset and one by four to be a testing dataset so we we'll take the first 75 percent for training and then we'll test on the later so we train from the feed train till the negative feeds plus positive we'll create a vector so the vector would be negative feeds the three fourth of the negative features and plus the three fourth of the positive features and the test features would be the remaining 25 percent of the negative features plus remaining 25 percent of the positive features so we basically instantiate an object of nape-based classifier and send the train features into it like send it as a parameter and then the util dot accurate the util class we imported initially like from nltk.classify.util it basically has a purpose of providing precision recall and accuracy so what we basically do is we send the classifier and the test features the remaining 25 percent we didn't use to train and calculate the accuracy of it so what it does is basically it gives around 72 percent of accuracy on a very basic classification system and then the when you show the most informative features in classifier the movie review corpus has around as far as I remember it has around 40 to 45 columns so what classifier or show most informative feature does is whatever columns had the most weight like the top 10 columns which had the most weightage in deciding or predicting the features of the test values it shows you that now this is a transparent box algorithm so you know like you can call the show most informative features and can understand which features are which features are deciding or impacting the decision of whether to use it or not but when we move towards deep learning and using RCNNs for sentiment analysis we actually lose that basis and then what happens is we don't know what is impacting but then we have a good prediction like the prediction would be above 85 percent but we don't know what is impacting the movie review so as the previous talk was said that if we don't know what is impacting the movie review then it is not probably of much use to the movie maker I can predict the I can predict the rating of a user but then if I can't tell the movie maker that okay your your genre is insurmountable or the your target market age group is insurmountable then what use for sentiment analysis be to the person who is actually performing it so to come to that we do an aspect based sentiment analysis where we find the accuracy on each aspects and then there's there's a type of sentiment analysis or basically use a sentiment analysis which decides this stance towards a topic so say a topic of nuclear power plants in a country and there's a huge amount of future data set on it so you want to know what the public thinks the government wants to know what the public thinks and what are their major concerns over nuclear power plants in the country that is called a stance reduction there's something called as plotting happiness in like movies, scripts, books, how the happiness goes over world over geographic locations there's a very popular work called as hydrogenometer related to it I'll show it in a moment then there have been work on predicting stock market fluctuations related to how the Twitter was responding to say a company going public so if the Twitter is responding very well to a company going public you should probably buy that stock because the stock prices would rise in a few days and say that if the Twitter is usually a very good data set for people to perform research on firstly because it is public secondly because people try to be just on the point of what they want to say because it is limited to 140 characters you don't have to go around the whole text and prune the text to know what they want to say and then there has been a work called as creating musical pieces for novels related to the sentiment so what the people what they that research group basically wants to do is say you are an amazing and you can't read the book you can't read the first chapter of the book so you'll have a musical file which tells you whether the emotions in the book are negative or how the emotions flow in the book whether they go from positive to negative to positive to negative and does the book end on a happy note or does the book end on a sad note and then probably on the mood you are in you can probably buy that book I'll show you the link to hydrogenometer so this is hydrogenometer and then this is their recent work which say which actually plots the average happiness on twitter related to their days so you can see that the Orlando shootings were somewhere here which is below which is far below than mother's day and then death of mama Ali was somewhere here so this way you can recognize the reaction or sentiment of the world at large related to some particular events and then you can know how how the people react to some particular things how they react to some ideas there are some things like there are some anomalies in this process like there's FIFA here which probably shouldn't be lower on the average axis so this is what we try to curve like this shouldn't be low but probably the people on when they talk about FIFA they always talk about this team should lose or this when a study is done on twitter about sports people tend to say negative about other teams more than positive about their own so they would rather say that Ronaldo is of no use rather than saying that Messi is a great player so that's why FIFA always tends to lead to negative results rather than leading to positive results in support of teams there is some so they have a work on exploring happiness in books so the most popular book would be Harry Potter so what they calculate is if they create a time series of the novel like from the start initial to the end how does the happiness go does like it starts from an okay shy and goes to a very high level in between and then goes to okay shy and then ends on a neutral mode so this is what they use it for and then there is research that creates a plot of this and then you can actually relate these plots the success of Norway's along with authors and other variables at task so this is one another feature you can add to the to the presentation this is what I have been working on in in the research assistantship I'm doing right now so it is basically used to add sentiment and emotion to health care so probably all of you would have heard the news of amazing adding emotion recognition cues to the eco model assistant it has so what it basically does it is when you talk to your virtual assistant it recognizes the clues in your voice and it recognizes the pitch and the modulation compared to your original modulation and then what it does is it responds how you to how you're feeling so say when you ask it what do I what should I eat when you're in a good mood it would probably say you should eat healthy food but when you are when it recognizes that you are in a bad mood it would probably recommend you to eat two tubs of Ben and Jerry's ice cream so then there is one particular use of sentiment in health care which is regarding stances amongst general public in health care policies especially about abortion so the most popular republican and democrat debate is about whether abortion should be legalized or should not be legalized so that's what it used for it is used to predict personality and it has been proven that a computer can predict a personality from your online footprints far better than any person can predict your personality and then the particular case I'm working on is to predict and discern cases of depression among social media users so how do you know that if a person is tweeting this or a particular set of tweets that person might be depressed or that person might need social intervention or that person should be referred to a societal helpline so there has been work done on it which which has been so I think this work is of 2014 so it says 70% accuracy from where the twitter model can predict if you are depressed and you should see a psychologist the model is basically now at around 81% to which it can predict whether you are socially depressed or whether you should see a psychologist or not and it's an interdisciplinary field where where basically according as to my father says the psychologists work with us to remove their profession so basically it has been proven far ahead in psychology that people are more inclined to speak to computers because computers wouldn't judge them and computers have no liability of revealing the information to anyone under any under any blackmail condition so computer would be completely non-judgmental about them so people would like to speak to computers so if the computer can predict the if computer can predict whether a person is depressed or needs needs medication then people would rather prefer to talk to computers rather than psychologists so back to this comic again that is what we are trying to do so if the person is talking this way to a psychologist we actually the most important and the most tedious job is to know that the that Kelvin is actually saying that he likes her rather than he doesn't like her thank you so basically the whole training set is classified into clusters and the clusters are decided say you model the clusters from usually from 5 to 15 and decide which clustering is the best the f score metric is used to define like if the model is accurate or not so yeah so if we have came and we usually use so so if it is getting rich neighbors we plotted on a coordinate system and the distance between them the centroid of that cluster and where the word occurs on a two on a coordinate distance is the coordinate distance between the centroid and the word occurrences you the coordinate where the word occurs of the cluster so it's a x y coordinate value so they are like okay so the the numbers are usually either yes the numbers that I plot on x and y so that can be anything from the the frequency of the verse say this is a word and this is a word and that frequency from the frequency to when we use word to work and low v it creates a dimensional vector and then we do not plot it on a two two coordinate axis then we actually find the distance between two vectors so when we are doing it basically using bag of words it is the frequency it's like the histogram so like if a word occurs twice in a document it would be at three position from here and then if it occurs this way so it and then when it's clustered so say it's clustered as as this way and then we find the centroid and then another word occurs and then we find the frequency of that word occurrence and then we can create the centroid the distance from the centroid of the cluster can you please all the languages no I actually don't know of any other language other than English so if I have to perform sentiment analysis in other languages I first have to know of the other language properly that what each sentence implies in that language and I'm not good at I hear the language other than English so what we usually do is we begin with this but then when we start implementing deep learning and tensor flow because these don't give a good accuracy so we directly jump from an ltk to implementation of deep learning and then trying to find like the features in the black walks