 So, let us start with what is natural language processing. There will be a huge one particular course of introduction to natural language processing itself. So, I will try to introduce a few basic concepts needed to create an automated essay grading system. That is what we will talk in next video. I will introduce the concept needed for understanding how to create a system to automatically create an essay. So, let us talk about the first two concepts like lemma and stem. Lemma is if we are given a word, we need to identify the like root form or basic form. So, grouping the inflected form of words. For example, eating, ate, eat. So, eating, ate, past tense or eats all this forms to a root word called eat. It is a verb called eat. The verb eat has been used as a ate, eat, eating, eats, all these things. Like talking, talk, talk, talk, everything is root word like talk. So, this is a lemma. So, you have to find the root word based on the group, group those words based on that all the inflected form of these words has grouped into one particular form that is a lemma form. Lemma is in English it is easy because the dictionary is there, a lot of people worked on that we have all the forms of words extensions, words inflected forms available. So, we have used dictionary. So, when other words comes, we just use the dictionary to find the root word and copy and paste it. So, system is well developed to do this. The simple thing in lemma is we might lose the meaning. So, we might lose the meaning of some words. So, not to do that, they use some other cost stemming. Stemming is very simple rules, they are set of logical rules applied. The rules like remove ing, remove ed or for example, remove ing, remove ed, remove s or something like that. So, ate will be ate only, ate will not be converted to eat, eat and ing is removed, eat, eat, eat, talk, talk, talk, talk, it is all that. So, remove ing, ed, l y, if the word ends with these suffixes that should be stemming. So, you will be like take the stem of this, not the extra suffixes, you will be removing it. That is the 1000-feet view of what is lemma and stemming. I am not talking about any mathematical forms or the algorithm behind it. But that is not needed. If you want to know more about it, if you find interesting this particular week's video, I request you to go and check the video called Natural Language Processing course by Professor Dan Jodofsky in a YouTube. It is available freely. If you like the course, you can also just say in a course here. But they explain completely the Natural Language Processing Introduction, how these words are to extracted and what is the information extractions, everything will be discussed in that particular course. It is interesting to check that out. So, since you saw what is lemma and stem in a basic form, I just want you to give a small activity. Consider sentence walking is good for health, but jogging is better than walking. Can you create lemma, lemmatized form of this sentence, also stemmed sentences from this sentence, stemmed words from the sentences. So, pass this video, write down your answers, then resume the video to continue. So, lemmatized form is it is simple, right? We know lemma is root form. So, walking, the root form is walk. S is actually root form is B, good for health. But jogging is gone, jog. B is better, better is actually form of good, then walk. This is lemmatized form. Stent form is bit different, walking. S will not be removed, it says good for health. But jog, it removes the jogging, because jogging means there is a G, which means you have to be not ING, also G-ING. Sometimes if you do not have that rule, it will have a jog, J-O-G-G, that is it. S better, so that is a problem. We do not know how to remove the words a bit, then walk. So, that is the difference between lemma and stem. But other concept in NLP is very famous, is called Ngram. It is basically unigram, bi-gram, tri-gram or Ngram. Given a word, it is very simple, it is very simple. Given a word, say I like to drink coffee, something like that. So, if it is a unigram, I want create a dictionary with all the single words in it. So, the dictionary will be unigram, dictionary will be simple. Unigram, dictionary will be simple. I like to drink, something like that. Bi-gram, I want all the two words together in the dictionary. I like is the one word. Like to is the other word. To drink is the third word, third combination of the bi-gram in my dictionary. Drink coffee is the one. If it is tri-gram, I like to drink, to drink coffee, something comes in. So, unigram, bi-gram is used for applications like Google, I will tell you where exactly. But let us see, let us see how this can be used. So, what is unigram, bi-gram tells is like, given a lot of words, lot of content exists in the world, you can go and crawl automatically, call all the content in Wikipedia, Wikipedia database available free, you can download and use it. Google News, Database or something like that. Can we create a dictionary of what is the combination of words occurring together? That is the idea of bi-gram. Tri-gram, it is not needed necessarily, but sometimes there is a word which is three centers, computer science and engineering. So, computer science, science and engineering is different, but computer science and engineering means there are three words needed to get that word. It is not necessary, but let us see how it works. And there is something called n-gram, it is something beyond tri-gram, like a four-gram or we do not know what is exactly what four-gram calls. So, anything is beyond n-gram, it called as n-gram, four-gram, five-gram, six-gram, something like that. So, in this activity, check this sentence. Learner's engagement in class on the interaction with peer is, peer impacts the performance in the assessment. The sentence may be not correct, but consider learner's engagement in the class and the interaction with peer impacts the performance in the assessments, some sentence like this. Can you find out the unigram, bi-gram, tri-gram of this, not n-gram. If you want to do it, just go for four-gram, but not beyond that. Can you find this unigram, bi-gram, tri-gram for this sentence. So, after you do it, please let us assume the video to continue. So, the unigram will be learner's engagement class, I remove the n because and their interaction with peer impact all these things will be the unigram. Bi-gram will be learner's engagement, engagement in class, class and there, everything will be a bi-gram. So, tri-gram will be learner's engagement in, learner's engagement in and engagement in class, in class and class and so, we are actually moving. That is actually not shifting the windows like a sliding the window slowly. So, the window is sliding. So, you can take the first three words, nest three words, nest three words, it is like that. So, tri-gram is there and n-gram is actually a four-gram. It is a four-gram dictionary can form. Why we are doing this? It is very important. I will show you why we are doing it. It is very simple. Let us see. So, if you want to know what the word will come next, when you ever wonder when you type in Google, we type the first word, it picks up and shows the second word. How this second word automatically appears? In Google search engine is not just an LP, it depends on the context where you are from and what is your personal profile, what kind of words you search already. There are a lot of modeling of you happening in the search engine itself. But the basic idea is that NLP is this. So, what is that tells you that the current word, that is a Markov assumption. I told about the Eden Markov model in last week, but that is Markov assumption basically this. The current word depends on last word. So, I just go for the same example, I like to drink. What is the probability of like occurring if the given word is I? You type the word I. What is the probability of second word would be like? If you consider English dictionary, a lot of sentences, the second word after I will be am not like. But maybe in sentences, if you have words most of I like, I want that can also occur. So, that might have some some property. But probability of 2, given the current word like is high. So, if the current word is I, what is the probability the next word will be to mostly like to because mostly like follows with the 2, because that is how they all sentences formed in the content like newspapers, Wikipedia. So, from the content database, you can construct this probability of what is this transition of the like to 2. So, you remember this transition we talked about, we can create that kind of probability that is what exactly happening here. So, we are creating that to drink is not possible. For example, probability of drink given to is not high probability. But consider if I have a probability of the previous word is 2, also the previous to previous word is like to. So, think about the sentences which as I like to go, I like to drink, I like to eat. So, now there are like 2, there are only certain form of things are coming though instead of 2, lot of other words possible. Now, we have like a drink, go all this thing. These words now we can have a better probability. So, what happens is if you have more sentences given in the property of previous sentence, if you know about history like n minus 1, n minus 2, n minus 3, you might able to better predict it. But Markov assumption says that it is not needed. And mostly you will able to predict all this previous sentences will be just by the previous word only. You know need to give previous word n minus 1, n minus 2, n minus 3. The Markov assumption says that there is no need to do that. It is almost equal to considering the probability of only the previous word. So, considering the probability of current word is previous word is nf to look at previous word like it is almost equal to probability of current word given previous word n minus 1, n minus 2, n minus 3. It says that it is no need to do that. So, that is what Markov assumption comes from. So, the current event depends mostly on the previous event. That is assumption. We have seen this. I talked about bi-gram everything in a previous slide. This is a special form of that. If you done a bi-gram, you have a dictionary of all the words occurring together. Now you have to compute like how many words 2 like, 2 it is coming. Just you have to compute the probability. Just once you compute it, you know automatically the probability of which word will come next if the first word is 2. So, what happens is when you type in a Google. So, when you type the first word, it automatically picks up based on the probability of these words occurred in the content. But in Google search is different because based on the current trend, what are the people are searching in the particular contest and what is the latest news in the particular IP address location. All those things is considered. But in general, the idea is finding the probability of current word given the previous word is what used. So, let us look at one simple example to understand that more clearly. So, Mark likes to eat meal with his family. Kai likes to sing and eat and eats meal with his friends. Kiran likes music. What is the probability of currently the word is 2? What is the probability of the next word will be 2? To likes to 1 times occurs, 2 times it occurs and 3 times it occurs. There are 3 times likes and other word occurs. So, in your bi-gram dictionary, you will have likes to likes to and likes music, the 3 words, something like that. How many words starts with likes 3 words, but how many words as to in the second 2 likes to likes to 2 times compared to total is 3. So, 2 by 3 the probability is 0.66. Very simple. So, what happens now is when you type the word likes, it automatically suggests to the word 2. T was what picks up. So, that is a basic, very, very basic form of what word comes next. So, hope you understand this. So, this is also kind of a probability and defining the state transition things. But given a huge dictionary, lot of content, language language has a lot of content available on the internet compared to most of other languages. So, identifying this dictionary is not easy. So, you can go ahead and take all the content in Wikipedia or Google News and create a bi-gram, it is easy because you might system and support. Trigram is possible, but 4 gram sometimes is needed, 4 gram also needed because that might give a better next word. It is not easy, it requires a lot of computational power, but Google already did it. Actually, they did a 4 gram database and they have it with them. But actually, Markov assumption says it is not required, you just use the bi-gram is enough. Let us look at the other concept. How similar is 2 words are sentence is the next concept. So, let us see, there are 2 sentences, 2 words, let us talk with the words. Analytics and these are the words. How similar these words given to this particular thing. So, this similarity is computed by minimum edit distance. So, how much minimum edits you have to do to get this particular word to this word. So, Annali, Annali is almost same, t i, c s is s. So, t should be replaced with s and c should be inserted. So, the 2 edits you have to do. So, this is a 2 edits. Litix, the 3 edits is needed. Analytics, there is a 1 edit is needed. Let us see. How do you find this? That is a simple rule that is called you have to apply operators, operators like insert, delete, substitute. And that is how the similarity is identified. Insert, delete, substitute. Let us look at the example again. So, here Annali, I am inserting this word, I am replacing this word substituting s with a t, that is called substitute. And I am inserting the word c. So, I did 1 substitute, 1 insert. So, there are 2 operations needed. So, 2 minimum edit distance. Here Litix is there. So, I might need to insert 3 words. So, insert 3 actions. And here only one word is missing. So, I am inserting only one word. But how easy or out of to identify this is not easy. You know, it is like I need to understand Litix is actually occurring in the last like 6 words not in the first 3 words or something like that. So, which means there should be a big matrix and comparing each and everything where it is matching. It is kind of it looks complex, but if you are proper algorithm, it is not that much complex, it is easy to do it. So, that is how the minimum edit distance is computed. So, how similar 2 words, it is basically based on the missing words or this. This is very, very useful in bioinformatics. There are a lot of work on this particular exactly using the minimum edit distance, lot of work has been going on and there is innovation happening in that field. If you are interested, go and check minimum edit distance and bioinformatics, you might find the answer. So, have you seen this application? Have you ever seen this application? If yes, where? Think of it. Again, in Google search, you can type anything, it actually picks up the right word and shows up. It also depends not just minimum edit distance, also the current trend in the particular location area. But in general, if you try the word, it automatically corrects the sentence. But even in PowerPoint, I am using PowerPoint, it will show the wrong sentence in the red color underlined right. That is exactly right. So, in a word, corrections in PowerPoint, this has been used. So, you have seen this everywhere. These are the basic concepts I want you to understand in NLP. Why I am talking about these concepts? I want you to use these concepts in order to create some automated grading system in a next video. Thank you.