 In this video, we will talk about bag of words and how we can use that to create automatic grading of SES. So, what is bag of words is simple. It actually picks the frequency of words, a lot of words occurring in the sentences. It picks each word or many times it occurs. So, we can put that in the bag of words frequency. And this also can be used to find the similar words like we saw to find the similar words. So, if you have similar words, find the similar words first and put that in the bag of words, because some students might write in a wrong spelling or something. And it is a sparse vector. I will talk about what is sparse vector in detail. So, what happens first is, remove the running letters, remove all this in and articles, proportions, conjection verse, if you move it or it is up to you how much you want to remove it. And correct the sentences spelling correctly, like using the similarity words, that is very important. And you have to create a dictionary. That dictionary should have all the words with the frequency of words occurring in the dictionary. Let us look at what is bag of words in detail in this slide. Sentence 1 says students interact with peers in class. Students are interacting with the peers in the class. Peer instruction increases students interest, something like that. Let us say bag of words of sentence 1. The words sent student occurred one time, interact occurred once, with occurred once, peer occurred once, in occurred once, class occurred once. That is a bag of word of sentence 1. Second sentence, some student wrote sentence 2, peer occurred once, instruction occurred once, increase occurred once, students occurred once, interest occurred once. See, I just wrote a word in a form given in the sentence. What you can do is, you can go lemma or stem form of it. So, the root form or stem form of it that reduces this the length of this vector. I will talk about how big the vector is going to be. It is not easy. So, that also possible, right? Let us look at that. So, if I come, if there are two sentences, I want to create a one bag of word for this two sentences. Student occurred twice to one time here, one time here, interact occurred once, with occurred once, peer occurred twice, in class once, all of them others occurred once. So, hope you understand this particular set I created from these two sentences. That is very, very basic. That is all about bag of words, very, very basic. You have to understand this how this set has been created from these sentences. I have just counted whether it is occurring once or twice and each word, all the individual words, the unique words are in the sentence, I have just counted how many times it occurred. That is a frequency of words occurring in the thing. Similar words can have a frequency can be increased. So, I have bag of word sentence for the two sentences. Now, I have the same two sentences. Consider this is my dictionary or this is a complete set I have. The set has eight words. The first word is student, second word is interact. Now, you understand I removed the, what to say, I removed the word in it. In is not here. What I did, I gave a position for each word. Instead of taking the word and putting its frequency, I have a dictionary. My dictionary has only eight words, just eight words. Students interact with peer class instruction increases interest. That is it. Just eight words, I have in my dictionary. The sentence is coming out of this dictionary. I need to find, I need to create the numerical form of the sentences. If I have this dictionary, this, I can say students is the first portion in this dictionary. So, one. Second portion is second word is interactive. Is it interact, exists in the sentence one? Yes, it exists. If it exists, mark it as one. If it does not exist, mark it as zero. With exists in sentence one, yes, it exists. Third one. Fourth one, peer. Is peer exist in the sentence? Yes, fourth one. Peer, spear is exist. Class exists, yes, class exists. Instruction, no, not exists. Sixth word is zero. Increases, not exist. So, my vector of sentence one is 1111000. So, if someone asks me, can you form the sentence? The sentence as students interact with peer class. That is it. That is the words this particular sentence has. It is not important about the order. The student occurred first. I will tell you about that. That is very important. But what are the words from the dictionary exist in this? So, like see, sentence one has a five words from this dictionary. The position of the word is also given 1111000. So, it is easy if he has one on zero for machine learning to classify because one or zero is very simple. Sentence two have a student in it. So, student is here. It is not the first position. That is what I am saying. The order of position is not important in back of words. Student is one, zero, zero, interest is not there, width is not there. Peer is there and the class is not there. Instructions increases, instruction is six. So, it is simple. What I am trying to explain is very simple. So, one, I am just putting the order. One, two, three, four, five, six, seven, eight. Hope you understand what I am talking. So, it is basically sentence one, sentence two and which words are occurring like in the sentence. Consider your many students writing these kinds of sentences, say five sentences they are writing. So, if you combine the word, you might get the bag of word dictionary of thousand or sometimes hundred thousands or sometimes it goes to even million because if you consider all the forms of the words, the dictionary will go big. The dictionary goes to the all the existing words in English plus different forms they occur. It is very, very huge. So, this can go to a big vector. But let us take its thousand words. Let us take it as a thousand words. If it is a thousand words and if it is thousand words and your students writes a sentence like this. So, consider there are six words in the sentence and there are a lot of zeros up to thousand. It is not just not three zero like zero. So, it is like up to thousand it is a zero. This vector is called a sparse vector you know it just only out of thousand only five words are there. So, it is not a complete vector like only sparse vector. So, in order to avoid this sparseness in the vector or the sentence formation, we can also use the index instead of the particular position. You can mark the index. For example, student occurred in the index number 1, interact occurred in index number 2, with occurred index number 3, pure index number 4 class and so forth 1 2 3 4 5 4 6 7 1. So, instead of writing each and every vector in a complete dictionary form like this, you can only say how many words that particular sentence are. This word only have a five sentence, this word only have a four sentence, something like that. It is simple to put that you know. It is very easy, this sparseness has gone, but even this is also of a complexity on its own. Let us see what is the problem there. So, which means you need to write a program, the first dictionary has been there with indexed then hashing the then you can take for sentence 1 and sentence 2, then we can only use the index that is enough for a representing that particular sentence. Why we are doing this sentence represented as a numbers, because when you give the words to a machine learning classifier like Ney Bayes or a Deshentry, they would not able to understand these words. Deshentry will not work here, Ney Bayes or some other classifiers SVM or something. They will not take these words and do it. Instead, you have to convert the words into particular numerical form. That numerical form can be converted using this kind of bag of words approach. Let us look at this where can be applied. Hope you understand there are two sentences and we can convert into words. Let us look at this. If you have 100 students writing essay and you have validated that essay by the human experts like two, three teachers, validated the essay into a very good, excellent, or average, poor, something like that. You have automatically graded them. If you want to create an automatic algorithm to grade these essays, what you do? So, consider what you learned in a previous slide. Can you use that knowledge to create a automated essay grading system? That is only that bag of words approach and similarity approaches. Can you think of it? Take a moment, think about it, write your steps, what you exactly do, after you do that, assume to continue. There is no response here. Instead, I want to explain that. Let us see how it goes. Student one wrote some sentence. I like coffee, something like that. Suppose the essay is about coffee. So, I like the coffee. I prefer roast level, tree or medium, four is good. Anyway, so and from some country, so Anjula's, something like that just can write it. Student says that coffee is not good for health. Maybe someone do not like coffee. Someone says coffee is very important, good for health, all these things coming. So, there are like 100 students wrote this kind of a essay about coffee. And you have a teacher grading that and say that this is a good average, good average per, only three grades, let us say the three grades, good average per you are creating it. Now, the idea is you have to create a bag of words. From all these 100 sentences, you have to identify all the unique words and put that in a one dictionary. So, the bag of word for the big sentence equals I like coffee, prefer roast, sorry for my handwriting, I am just prefer roast level. So, now you might see what is this I am talking about is I am talking about a bi-gram. So, if you put all the words together, you just actually could in a bi-gram. And a frequency can be used, but instead of using frequency, let us say this exact one set I am creating, one set of dictionary from all the existing words you are creating. Once you have that, I am going to do student 1, it is a id, student 1 as so some vector 1, 0, 1, 1, 0, 0, 1, 1, 0, some sparse vector with the label good. Student 2 have something like that. So, the idea is now what you have is your sentences, the students essays has been converted into a matrix with the label good average poor, there are three things you have it. So, there are three, it is not a binary classifier, so three multi-classic classifier, the three classes here. And you have a student id and you have the features, features not one, the feature is the length of this dictionary, the dictionary can go to 5000 or 10,000. That big dictionary you have, you have this, this is the basic form of automatically grading your essays. What you have to do, you have until students data and you have done all this grading, you have to take a 10th full course validation, train the system and test it on the 10 times, 10, 9, 30 times and you create a some accurate classifier, which automatically feed any essay, it will give you the average good or poor. So, you do not need to do it from the next time. It is very simple, basic form I was talking about, but it will not yield a good result, you know. But you can check up the latest papers and try to understand what they do. That is very interesting, if you can do that. And I just want to tell one more thing, any form of machine learning classifier we are doing in a supervised classification is, this is the form that is a matrix of features x1, x2, x3 and there is a weight we want to find out, weight associated with x1 till xn equal to the level, level can be multiclass like a binary class or multiclass. So, that is exactly what I will be, it is y, it is just y, level. So, what I am trying to say is, it is exactly the form of machine learning colleges, exactly what we did in matrix in our schooling time. So, that is exactly what are the basic form machine learning. If you can imagine any problem into this particular form unable to apply, you can understand that what is algorithm how it works, everything is easy to understand. So, if you want to go in detail about that form, not every matrix have the perfect solution. So, if you do not have perfect solution, you have to find the nearest solution. That is exactly what is happening in the machine learning. So, not every problem have perfect solution to put the weights to exactly match the words there. So, we have to identify the nearest possible solution that is global minimal solution, with least possible error. Some classifier is able to do it, some classifier not able to do it. So, that is why the way performance in the classifier varies based on the way they approach. Hope you understand what is bag of words and how to use bag of words for automatically grade essay. If you have students done assignments and you have graded them automatically or if you graded them using your teachers or your friends graded them the assignments or the marks, take that as the input and convert that into a bag of words like a big dictionary and create the, if you have understood this data is very good. So, it is a simple scripts can help you to convert them and you can check it out whether you can create automated grader system or not, it is very easy to do. If you are from this course start, if you are trying to understand programming, this is a very good simple example to learn and try it out. Because identifying the set of dictionary or their index not tough at all, it is very easy. It is in very possible in a pandas library. So, try it out and check it out whether it works or not. Thank you.