 So topic is natural language processing. Our speaker is Abhinav Gupta. He's a fresh master graduate from IIT, Kharagpur. He interned at Zomato around natural language processing, determining review highlights from restaurant reviews and what dish to order from a restaurant, okay? So that might be indication that how this talk might be, but let's make it entertaining. So recently like the last talk was very encouraging by Nicholas and he had like given us an idea that we should work on a dojo. So his main methodology was like, you should at least get something from this talk that you can implement again when you go back to your home. So what I'm trying to demonstrate is how you can implement natural language processing because yesterday I attended a few talks on natural language processing and not many audiences familiar with it right now. So I just want to give you an overview of how you can do it and then how you can go in complexity. Like there are very open source modules, libraries available in which like within a couple of minutes you can have like what I have done within months you can do within a couple of minutes is the very basic structure. So I will have some quick demonstrations for you around. Like I have embedded it since I have like noticed last day that there might be some issues when you switch from screen. So I've embedded all the iPython notebooks in this like a snapshot of all these things. So just a short introduction as like as already given I started with end-rangey machine learning course then went out to Zomato then did some courses in natural language processing and finally I'm presently working American Express. So like during the talk if you have some doubts then like you can directly tweet on my handle and I will reply to them once I'm done with the talk because there might be a couple of doubts with the concept I'm going to explain right now. So it's better that you can tweet them at their same instant so that I can solve them and not wait for the end. So this project started with the concept like the students, it's based on the school students like they have a discussion going on. So not like a single teacher cannot monitor each and every discussion that is going on in a school between two students. So we want to develop a framework which can automatically judge which student is performing better in terms of like his argument quality, his overall stance. So let's start with an example so that I can explain you. So this movie is going to come in 2016 Batman versus Superman. And there is a guy who started this discussion by saying that I think that movie is so cool and I think Superman is going to win with his Kryptonian abilities and all. Then there are four girls on the other side. First one says Superman is definitely more capable than Batman indicating that yeah, Superman like she is biased towards Superman. The other girl says that Batman is so much technologically advanced with his Batmobile and all. So, preferring Batman, the third girl is not clear about what she wants to like take her stance on. So she says that both are equally powerful in their own ways and then there is always this girl which says that benefit is so hot. So now we have these four girls and these are the complexities which can arise when we look for a debate. Like if someone says that Superman is definitely more capable than Batman then we have to determine like what is the stance? Whether he is supporting Superman, whether he's supporting Batman. If someone says that both Batman and Superman are more powerful then we have to take care whether is the argument good enough. If you're presenting a neutral argument then it is not going to help in a discussion. You're basically just like commenting on something random just to make sure that a debate flows out. The third, if someone says that yeah, Superman was very powerful but now I think Batman like Batman is much more technologically advanced so I think Batman is the good guy or the better one. So then the debater has changed his stance during the complete debate and the final if someone says Ben if it is so hot then I think it is not right, not related to debate right now for this discussion. So these are the four complexities which can arise in a debate. Like you have to judge the stance by the argument. You have to judge whether the argument is good enough or not, whether the debater is firm on his stance and whether is the argument related to the debate or it is not relevant. So the first part where the argument is related to the debate or not is done by sentence similarity. Like basically if you are saying some sentence and if there's a counter argument then both of them should be semantically or meaningfully similar. So if someone is talking about Batman the other guy should talk about Superman if the debate is about Batman versus Superman. It should not be someone is talking about Batman the other is talking about Iron Man in a completely different context. So the first pillar judges about the sentence similarity that whether the argument which you have presented is similar to the previous argument. The second is a sentiment analysis which judge whether the polarity of the argument. So suppose if someone has a positive stance on the topic and if he changes his stance then we have to capture that thing. Finally, the third step is backtrack where in the overall debate has he changed his stance from positive to negative? Like was he initially supporting Batman? Now he's a supporting Superman. So has he changed his stance? And the final is how can you score the complete criteria? So I will start with the very basics of how you can do natural language processing. Then I will start pick each one of them. So first of all, why I use this? Like though like I believe there's a perception which I heard yesterday that it might be it is slow but like I have never faced that it is very slow. I have used TextBlob and extended implementation of NLTK and I think it works very fine for me right now. I use NLTK because it has a huge corpora, the wordnet and other lexical and corpora databases are already included and at the end it's an open source community. So talking about some of the very basic terminology so if you have a sentence like I do not feel very good about Monday mornings then tokenization is when you split it into each of the words then there's also a term known as part of speech tagging. So you do not have to, so I will also demonstrate this. You do not have to actually write like maybe 10, 20 lines. This can be done in a single line where you just pass our input to a sentence and it can tag what is a pronoun, what is a verb. So this is how simple it can be. So if you have, you just import a library, you just give it a sentence and here is the simple tokenize thing which it tokenizes directly. Here is a POS tagging which is the part of speech tagging where you can do it directly. So to start with an LP it's not like you need very big knowledge, you can just start with the basics, like you can just follow the slides. Then you can remove, so then there's a concept of stop words. Stop words are something like is, am, are, not, nothing. So you can also remove the stop words, like I do not. So I remove these, and this can be done again easily. So you can, there's a corpora known as stop words in NLTK which can simply remove all the stop words. You have the filtered words here. And also stemming is like when you remove the extra pronouns, you have it in the root form. So like the mornings it used to mourn. So this is again then using NLTK. It's like within like I guess 15 lines of code, you already have a filtered database from an initial random text right now. So this is where the initial, we've got database screening works, where you clean a database from all these, like all the stop words and all these disturbances around. Then you progress to sentence similarity which basically judges whether your argument is related to the debate or not. So sentence is, so suppose if you want to measure like whether two sentences are similar or not, then you check whether the two sentences have the same meaning or whether they have the same word order. As in, so this is how it can be done in a very simple way. So if you have a sentence, one like a new NASA initiative will help lead the search for science of life beyond our solar system, then you can first filter it where the second part shows the process sentence. Then you can have a combined joint set. Joint set is basically you merge both the sentences and remove the common words. Now basically it's basically a cosine similarity where you compare word by word to find whether they are similar words or not. So this is a very basic level of semantic similarity where you check like whether the two sentences have the same meaning or not. So this is how you can start with the basics by using the cosine function which is available in any module to just calculate whether two sentences are similar. Then you can have higher level filters where you can use a word net. So word net, it's done by Princeton where everything is organized in a hierarchical sort of thing. So you have an entity where it is a life form, then you have an animal, then humans are organized into adults, male, female, juvenile. Then you can use it to easily calculate. So if you want to calculate the similarity between boy and girl, so you can see that from boy you have to go to male, male to person, person to female, and female to girl. So apart from boy and girl, you have calculated, stepped three more paths in between. And you, and also if you look at animal and boy and animal and teacher. So animal, life form, person, male, so there are three steps. If you look at teacher, male, person, adult, professional, educator, so these are five steps. So if you just compare by using the number of steps you have counted, then this is a pitfall of this thing. So you have to take care how deep is the hierarchy which is we are considering right now, as well as how short is the, like how short is the path length. So using this concept, you can enhance the model and I have given reference for the research paper so that this is a very good research paper in terms of how they have explained the whole ideology about sentence similarity. So this is how you can quickly do it. So you use the textbook library which is easily available. You give it a sentence and it can easily tags, like this is the parts of speech where it again tags the noun, first noun, subjective noun, and it tags both the sentences. Then you simply use word net. So suppose there are two words named as search and hunt. Then you simply also you have to give the context so that it can be relevant, that both are nouns. Then you can directly calculate the part similarity by using these simple functions. So this can be such easy implementation with a few lines of code that you can implement this word net concept. Now there is a word order similarity concept also. So if you look at the two sentences, a quick brown dog jumps over the lazy fox and a quick brown dog jumps over the lazy fox. These two are similar, but if you just reverse the order of the dog and fox, then it becomes a completely different sentence. Now if you use your traditional cosine similarity or any other method, then you can directly find the, yeah, both of them contain the same corpus. So here the importance comes off whether the word order is exactly same or not. So you compare the word order one by one and then you formulate a score. So I'm not going into mathematical just right now, but I'm just telling you that how this can be done. And this word order similarity is also implemented like this is the final formula which can be done using NP linear log, numpy has a linear log algorithmic model which can do this, I guess. Once you have the, so we already have the semantics and we have the word order similarity, we define a overall score and this is already done in a research paper, so nothing new about this, but this is about the ideology where you can use the semantics as well as the word order to judge whether these, both the sentences have the similar meaning. Are they talking in the same context? So this is the final result talking about. So there are some flaws in it which I am going to show you. So if you say that I like that bachelor and I like that married man, married man. So the unmarried man, so these two sentences are very similar. So there is a high similarity score, but if you look at the setting second sentence, both of them contains the same word, the word order is very similar. So the algorithm gives a very good score, but one is an opinion and the other is a question. So these are some of the, basically some of the minor flaws, but it solves the problem right now because in case of debate you are not going to ask and a question from the opposite one, most of the times you are presenting your own argument with its own strength. So if you focus on these models, these examples, I have a pen and where do you live? So these are the similarity of zero and I have a pen and where is ink because pen is related to ink. So this example show you that yeah, there is a like this model has proof of concept and it works and higher the score and higher will be the semantic similarity or conceptual meaning. So this is how semantics work where you actually check whether the present argument and the previous argument are semantic like are logically related to each other. Now comes whether you are changing your stance. Sentiment is basically about some, if I said that Superman is very cool, so it's a positive opinion about Superman. So whether I am expecting a positive opinion, expressing a positive opinion, a negative opinion, or whether I'm neutral about the whole thing. So this is a present framework like these are, so there's a site known as textprocessing.com. It has an API available which can be easily accessed, but like for in my case, most of them have not worked in case of debate. So if you see, this is like I disagree with your statement. These are the statements which you can have in a debate and the overall sentiment analysis is positive, like which is not right obviously. And if you give a more complicated sentence, I disagree with your statement, I think homework should not be banned. This overall statement is positive with respect to homework, but this like the present available solutions are not able to process this because most of the solutions are presently trained for movie corpuses, movie reviews. Whereas someone gives a movie review, we're not considering sarcasm right now. So if someone gives a movie review, then you can directly know whether it's a positive movie review or negative movie review. But in case of debate, like you have to be very humble as well as contradict the person whom you are going to say to. So we need a model which is specifically trained for it. If you want to do very easily, then Text Blob itself has a module which is give it a sentence and ask it the sentiment. It can directly give you the sentiment without you using any, your own inbuilt model. So you do not need to code anything, but like this does not work also. So you have, I disagree with your statement which has a polarities when like, so there is no positive or negative opinion. Subjectivity says whether it's an opinion or it's a fact. So right now if I say Superman has Kryptonian abilities, then it's a fact that Superman has Kryptonian abilities. But if you say that Superman is very cool, then it's my opinion of Superman and it can change from person to person. So these are some of the examples of like how you can use it, but it does not work right now. So I started like, we used a very simple approach. The standard name based classifier. We trained it on Congress presidential debate. Like the link for the debate is, is available in the corpus section. This is available. This database is very well maintained. It's done by Cornell University. They have, so it's a database where there is a debate going on and every argument, so the argument is like, maybe like 10, 15 lines. So every argument is tagged whether the debater has presented a favor argument or opposite argument. So you give a name based classifier. A name based classifier is basically, it's a conditional probability thing. So if you say that after, so I like Superman, so it basically calculates probability. That is what is the probability that if this words of occur after this word, then it will be a positive sentence or it will be a negative sentence. So on the basis of sequence of words, it calculates the probability. So you give it a training corpus of Congress presidential debate, a label-driven corpus, saying it whether it is like positively labeled, whether the argument was positive or negative using this complete corpus. And finally on the basis of probability score, do you determine that if the probability score is given in 0.6, so this is our own benchmark, then it says a positive sentiment. If it is not on the negative side or not on the positive side, it's a neutral thing. So zero represents whether it's a neutral or not and one represents it's entirely positive thing. So based on these, like based on the output of the classifier, you determine whether the sentence was positive or not and I will just show the results. Talking about the features, so there is an article by a stream hacker where they have done a text classification thing. So in case of like sentences, they have demonstrated that if you use bigrams, so bigrams represent two words. So if suppose Superman is cool, Superman is a bigram, is cool is a bigram, so pair of two words is a bigram. So they have demonstrated that if you use a couple of bigrams as well as single words. So if there is a word which has very high frequency, so if there is a word which is disagree. Now disagree itself in a statement can demonstrate that yeah, the person has a negative sentiment. So using the combination of such words and bigrams, you can have a very powerful model. So this model has a training accuracy of just 65% right now on the sentiment thing. And like the best I have seen till now for the open source modules is like 70%. Like there is also a model by MIT right now, but I'm not sure how to implement it. So this is how it works. So this was our present issues. If someone says I disagree with your statement, then most of the classifier were not able to classify it. So now the classifier, so once you have trained it, then this is a very easy line just to implement. Classifier dot classify my present sentence. And it says this is negative. And if you have I am against homework ban, so this overall sentence is positive because you are positively aligned towards the homework. So this is how the present model after the need this classifier works. So you have the sentence similarity which checks whether the argument is related to the debate or not. You have the sentiment analysis with check what is the polarity of the argument whether it is positive, negative or neutral. Now you have to make sure like now the basic question is I know I can compare two sentences and tell whether they are similar or not. Now, which two sentences should I compare? Or I can tell that this argument is positive or this argument is negative. But how will I know that whether what was his previous stance? Whether his previous argument was negative. So this argument should also be negative. So I have to keep a track like which two sentences to compare, which two sentences to have similar stance and all. So this is we call backtracking where you have whether the debater has changed in stance or whether the argument is good enough or related. So first of all, this is how the interface UI looks like. This was done in Django. So there is a Django chat plugin which was developed with Sharon. So this is easily understand. It has a very good API where you can use it. The advantages are any chat log can be, so there is a JSON format directly available for all the chats. So this is how the framework looks like where there's an admin which starts the whole thing. Then there is user one who presents the argument and there is a user two who counters those things. It's presently a multiplayer, like a two-player game right now. So this is one of the visualizations, the whole debate that how it has gone. So this is a manual thing done. This is, I have done this in Python, but it can be done without having this much visualization. So suppose you have an argument which is started by the admin saying that should we ban homework or not. Someone says I am in favor of this notion. So if you see the overall thing, should we ban homework and I am in favor of this notion? So basically he is against homework. And if someone says I am against this notion, so he is basically in favor of this whole concept of like he is in favor of homework. Now, so if you see this gentleman here, the next argument which he presents is homework has little educational worth. So his present argument is negative. And his overall stance that he was always against homework is also negative. So like he is currently following what has he initially proposed, that he is always against the concept of homework. Whereas this lady on the other side, I am against this notion. She will always, since she's always inclined towards homework, she will always present arguments that whether she is in favor of the homework. So we have to keep track of the thing that whether if someone has taken a stance, so the user one who has a overall negative stance towards homework, he or she maintains that during the complete debate, which is going on. So this is what the overall cracks of the whole thing that if you start a argument, fresh argument, and the next argument should have the like, so if someone says I am against homework, then most of the argument should reflect he or she is against homework. Also, if there are counter arguments, so if you have this argument that homework has little educational worth, and if someone says there is strong and positive relationship, then both of the arguments should be opposite in polarity to each other. And the final things that comes is how, so if someone says two consecutive arguments together, then they can have the same polarity. So these are some of the technicalities which I have mentioned in the slides so that if you want to start it, so there you can have your, what are the things that you need to consider? And talking about, so suppose if someone says I am in homework has little educational worth here, so we have to check whether the statement which he has presented here, is he talking about the homework or not? So you also have to compare whether these two sentences like this and this are meaningfully similar to each other or not. So you have to track the sentiments as well as you have to track the semantics. So this is a quick recap. So first of all, you start with, in a debate thing, you start with measuring the sentence similarity where you find whether the similarity score between two sentences so that you can know whether they are meaningfully similar, they are word order similar or not. Then you have a polarity of the argument which says whether the argument is positive or negative or not. Then you have a backtracking thing which tells you that among which sentences the similarity should be measured and which sentences you should consider to measure sentiments and all. Finally, when you have all these three pillars, then you can arrive at a scoring approach. So a scoring approach takes care of this. Right now, this scoring approach is pretty naive and even I am looking for inputs from the audience right here. So presently we have just implemented it and it works for, like it has worked for our few sample cases. So we have a sentence level score which is SS and you have a semantic score. So for, if the argument isn't a correct flow, like if none of the user has changed his stance, then we give a 75% weightage to semantic score and 25% weightage to similar, like the 75% weightage basically to the meaning score and 25% weightage to the polarity score. But if the user has changed his stance during the debate, then we changed, then we actually penalize him that why he has changed stance during the debate. Again, there is an amplification factor. So suppose you present an argument that is very good and now the opponent has to argue that, otherwise he's going to lose a bit because the argument quality is pretty good right now. So sometimes there are some arguments which can attract discussions, which can invite audiences like you should counter this argument. So greater the discussion goes on, more relevant or more strong the argument is. So amplification factor basically counts that how deep, like how much discussion has gone on a particular argument. So on combining all these things, you have a score of like this is technical and you can again go through. And this is the final slide for me. So these are the present challenges which I faced right now. So as you might have seen that this is all natural language processing, but this guy does not talk about what if I present a fact from some UN report saying that homework has affected or benefited so many people. Yeah, so that is a valid point that I have not considered factual strength and now that how strong is the fact, it can be done by escaping through web and that is not implemented. Also this has been tested on a couple of used cases. That's why I am right now looking for some of the arguments. So suppose if you want to test it then there is a pretty comprehensive site known as idb.org where you can scrap all the debates so it has a debate database. You can scrap all the debates and you can validate it through using a model. And finally, most of the native base classifier is separate. So there is a separate classifier for neutral arguments in most of the native base applications because neutral arguments have some complexities because they can have some sarcasm where someone might not be referring a particular opinion, but he is just trying to counter it. So presently a native base classifier is not designed to capture neutral arguments separately. So these are some of the flaws or the challenges which are presented in it right now. So I think I am pretty quickly done with it. So I am looking for questions. Hello, thank you for the talk. I want to ask like when we talk about debates, so in formal debates they are really moderated and everything is in a flow, but the Django app that you told about, it is a chat based debate, right? So in chat based debates, the replies are not direct and moderated. For example, the Batman versus Superman case, if there is an argument that yes, Batman is really cool in advance, the reply might be, yeah, right, with exclamation mark. So sarcastic comments, how are you actually tackling them? How are you capturing their notion as well as the sentiment out of it? So right now the native base classifier cannot capture sarcasm, but if someone wants to capture sarcasm, then there is an implementation by MIT on sentiment analysis, where they capture all the sentiment using a tree based sort and you can access it like there is a API available for it. But yeah, the point is right, it is not captured right now. And if someone says, yeah, that's right, so that is just you are like confirming a stance, but you are not presenting an opinion. So we do not value that thing right now. Hello, let's say like, hello, hi, this side, this side. Okay. This side, this side. Okay. Let's say like your corpus is from social data. Okay. Now, nowadays like in social, people are using short text. So how are you going to overcome with using this NLP? Do we have any classifiers? Yeah, so in case, so there is a social data is basically like, if you are talking about short sentences and social data. And even linguistic sentence as well. It's as bilingual. Yeah. So basically I can take Twitter as example, right? Yes. So even for Twitter sentiment analysis, like I have gone through several papers and maybe this classifier has worked to the most, but there are certain things you have to take care of. First of all, the emoticons depict a particular sense of sentiment. So you have to have a separate classifier which can tag that if these are the emoticons, then what is the probability of a particular thing which is a name-based classifier can do? Then you have to replace like hello. So if someone says hello with like 10 O's or 11 O's, then you have to like, so there is a vocabulary for it, which can replace short forms like what's up into what's up. You can have to fill, clean the data and name-based classifier till now I have known worked best in case of like the social sentiment analysis thing. Hi. My question is you have talked about sentence similarity. Yeah. In case of pronouns, if the debaters use pronouns in their sentences, how will you analyze the similarity between the sentence? Can you give me an example like? Means if you are talking about Congress and BJP. Yeah. Someone says BJP is quite bad, working badly. And the next sentence they said, they are worst. Means then how will you compare that they are similar sentence and they are talking about the same context in the same context. So if the same debater is presenting in, if someone says they are worst and the same debater is continuing his argument, then you can easily track it which is a debater. If someone else starts an argument, then it's difficult to track that. So if someone says, so you said Congress is bad and if the same guy says they are worst, so we are actually checking who has said the previous argument to know like, whether it's the same guy or it's the opponent. So if the same guy then you can easily track that who has said the last argument. And if, what if in the first case they are using pronouns in that case, how will you, means how will you, just the context when they're pronounced, right? So yeah, that's an assumption right here where you consider that so if the debate is about should be when homework, and if someone says yeah, you should ban it, then I am considering that if you're saying you should ban it, then you're talking about homework right now. So there's nothing like resolving the pronouncements. You are not resolving the pronouncements used for which, in which context, if it is used for VJP or it's used for Congress. Yeah, right now there is nothing like, so there is, you're talking about this pronoun disambucation, there is a word sense disambucation also. So these are some of the complexities which I have not included for this particular model. Thank you. You gave that example about I do not feel very good on Monday mornings. You mentioned that not is a stop word, but that actually completely reverses the meaning of the sentence. Yeah, so. I don't think not can be a stop word. Yeah, so when, so not is actually, so if I use that not nothing to remove a stop word, it will remove it. But in case of debates, you cannot afford to remove a stop words. So when this analysis is done on debate corpus using features, then you actually do not remove stop words and all. You consider the same sentence and then you process it. So like if you want to read it more, so this article can kill you. So you do not remove all these things because not nothing and do not, and all these are removed in some most of the cases. So you cannot afford to remove all these stop words from the corpus when you're talking about debate. It is. Yeah, so in most of the cases you have to, I have looked at the data to have looked at some of the simple cases to have an idea of the threshold because it's, so it's a simple, for me it was initially a simple judgment call when I say it's given in point five, then it's a positive argument, less than point five, it's a negative argument. But then you talk about a neutral arguments also. So it's a manual thing right now. So that's why there is a need for you to have a separate neighbor is classified positive and negative and a separate one for neutral arguments. So you do not have to have this manual task and you can have this complete automation thing. In real world arguments or debates, it's often common for opponents to qualify their position. I mean to agree to the opponent's position for some time and then constructively contradict them. How do you tackle that problem? So this problem of where you actually use sarcasm or where you use contradiction in your previous statement is still not done in this. Not contradiction, I mean, let's say. Can you give me an example that one? Regarding the homework problem, it is, let's say I start my argument as, it is often, it is a fact, well-known, it's a well-known fact that homeworks are good and they are good for a student. But in today's world, it has become more of a liability and then. So it's like a first qualifying for a, qualifying the position and then contradicting it. So if you check that the overall stance of the whole sentence will be negative because so I check this thing using the stance thing and the Naive-based classifier is trained itself on the Congress debate data. So we have checked with these things where you try to initially present a stance and then counter it in the next half of the sentence, then this can, this works out as negative in the purpose. Hey, is there any alternative to the Naive-based classifier because I used it for sentiment analysis on tweets. And so the NLTK corp, I didn't use the NLTK corp. So I used my own corpus and I had a large collection of tweets and the training phase took a lot of time when I used the Naive-based classifier. So is there any alternative? So there are alternatives. So like people have run even neural networks and decision trees on these things. But I believe like the Naive-based classifier, the time might be like, it's about the pre-processing. So if you do not remove smileys, if you do not remove all the vocabularies, then it's going to take time. But Naive-based classifier, as far as you know, is the best in case of social sentiment analysis, where you have to analyze the social arguments. So have you tried using the centi-wordnet? It used to be there along with the corp. Yeah, I have not used centi-wordnet. I've just used text blob and textprocession.com basically to implement these things. You haven't tried it. Yeah, you haven't tried it. But like this project was done under the supervision of one of the professors also. So we came to know about like the debate scenario. So there's a paper about it that the debate scenario is very different because it's more about politely disagreement as compared to the normal disagreement where you say like this movie was so horrible that I cannot sit for the half time also. So it's not that strong sentiment we can capture it. So you have to have mild sentiment captures which is not done by most of the cases. Did you use any feature transformation while training the classifier like TF idea for maybe just taking the counts of the biograms you have used? Yeah, so this top 200 unigrams and biograms are calculated basing. So suppose if that TF idea concept is implemented. So if some biogram is very frequent across all the debate thing. So that biogram is given a lower priority. So this is then TF idea is considered for each of the unigrams. So NLTK has an inbuilt function. So if, which can give you top 200 unigrams or biograms based on whether like on their term frequency, which are current frequency. It's basically known as information retrieval that how much information a unigram or a biogram can weigh. Okay, Abhinav, one question. Like I was doing a Twitter analysis and the problem what I was facing is like sometimes like statements is like take a Superman versus Batman example. Like Superman is not cool. And the other question, other statement could be Superman is Superman is not bad. So this not changes the words completely. It's a sentence completely, right? So like if I give a number like for not, I give a minus one. So how do you dynamically change it according to the sentence? Like not with cool is like not good and not bad is a positive word. Complete sentence is a positive. So how do you deal with this? So right now it's a probabilistic model where like, so if someone you have mentioned a sentence like not bad which reflects a positive is which is mildly positive. Where it's not cool is negative and it's very common. So this neighbor's classifier is a probabilistic model. So if it's a not cool thing, so it is going to give a higher negative sentiment to it. But if it is not bad, then it is going to give a lower positive sentiment to it. So it's a probabilistic model. Like how common it frequently it occur, then it gives a probability. So if just saying it once again clearly for you. So Superman is not bad is a very rare term in terms of the debate corpus. So it is a mildly positive thing and it will get a less positive score in the sentiment. Whereas Superman is not cool is very common case of debates. So it will give a highly negative thing. So you can judge like which is the most strong argument in normal cases. Actually two questions. Yeah. First how you detected the double negations. The double negations means one negative coupled with another negation. I think it's most. That's the question here, it's already. The probabilistic model, it is what it is. It was like something word to back or something where you try to see that context and based on that you give the. So I have not implemented word to back and like even word to back discussion. I am, I have seen that it is a good representation of words and so I'm not seeing how it works. Since like for me, I have not seen how it worked in sentences similarity. It writes a simple name, this classifier. So it's an independent probability assumption where you calculate that if this was. Given the different words, what is the overall sentiment of that sentence? Yeah, it's a simple, it's a simple neighbor's classifier thing. Last question. Yeah. So most of the debates are like speech and for to do NLTK, you have to have text and speech to text transformation is another challenge. How do you tackle that? So that, yeah, that's a really good question and even like most of the speech. So if you, so already this sentiment has a 70% accuracy. Speech might have another 70% accuracy. So this is reducing to 49%. So I cannot, so this might not work good well in case of speech debates, Congress, speech debates. So these are some of the debates, like when a debate occur then there's someone a typewriter who's actually noting all the comments. So this is how it is made. And it is, that's why there is a chat framework for it. And I think I'm not sure how it can be done in case of speeches, where the accuracy is getting 75% so that you don't have a fair judgment that whether it is good or not. Thank you everyone. Thanks Abner.