 In this video, let us talk about NLP tools. There is one famous tool, I just want to introduce that and also what is the latest in natural language processing. What I talked about back of words is 10 years old. So, there is one tool core nlp.run. This library is been created by Stanford University and it is available for more than 15 years and they have started with the basic back of words approach in a Java library. Now, it is up-level in different script languages. So, I was talking about how to compute get the dictionary from the given sentence, you just have to use this library, it will give you automatically all these things. So, this library is very, very interesting. NLP what we saw is very, very like a tip of iceberg. It is not complete. What is NLP? NLP can do lot of other things in it. I just want to show you a demo of this core nlp.run. So, if you go type core nlp.run, you will get this particular page. You can type any text to annotate. They have a version, you can download them. And you can do annotation based on parts of speech, POS tags, named entities, dependency pass was a dependency between these two sentences and everything. So, I am going to put the English, the different languages. Let us put English, because I just want to use the existing sentence. I do not type anything. The quick brown fox jumped over the lacy doc. You know, this is very famous sentence. It includes all the a to z letters. It is what we type. When you try to learn type writing, you might have just started with this quick brown fox jumped over the lacy doc. So, part of speech, it is kind of determine the slick object, quick brown. So, adjectives and fox is a noun here. And verb is jump. So, you see, it is what I am trying to say is this particular tool, if you type the word, it automatically identifies who is noun, what is a verb, what is object, what is adjective, everything is identified automatically. So, this tool has been created with a lot of training data. So, you can use it directly. You just have to use this particular library and call this particular function called part of speech, it will give you these values. And also, if you want a basic dependency like fox is adjectives on the quick brown is on the fox. It is not on any other terms like this is determined, this d is on this particular noun. What is the dependency of this d? And let us see it is very interesting. So, jumped over, all jumped over, jump is on the dog, the fox jumped, this is exactly the relation between fox and the dog is jumped. That is what it exactly identifies. It extracts the information from this test automatically. So, if you want a more dependencies, enhance dependencies also there. And more importantly, the information extraction I was talking about. So, it extracts the information of these two things. So, lazy dog is the entity, brown fox or quick brown fox or fox is the entity. If you consider fox is the entity, dog is the entity, the relation between these two is jump. Let us go back to the next latest things in NLP. So, the latest thing, latest in the sense latest from 2013 or 2014, right, latest not latest last year this year. The latest thing is word embedding. So, here each word is represented as the vectors. The vector dimension can be 50, 100, 150, 200 or 300. In general, 300 seems to be good. It works good for many other domains. Also in education domain, there is a paper on that in 2018 EDM conference which dimension is good for educational data set. It seems to be 300. But you can fix your dimension to which dimension you want it. And the context also you want to create a word vector based on only the education data. You want to create a word vector based on the news data set or you want to create a word vectors based on Wikipedia data sets all are possible. So, what is word vector means? They give this huge data, right, the data occurred in the news article into the neural network, a deep learning neural network. And it tries to combine and get the relationship between each words based on its location and everything, relation with other words, it gives you the number. The output layer in the neural network is defines the dimension 50, 100, 300. So, if more the dimension, more the complexity, more the detailed values, less the dimension you are abstracting the words, 50 seems to be good or 100 is good. So, you can try different words. If you want to create your own word vectors with your own sentence, there is a program tools, everything available. Just you have to use that particular program by Thomas Mikhailov. I will just give the link also next slide. So, this word vector is created by you shall know a word by the company it keeps. It is basically based on the words it co-occurring with. That is the exact basic core of this. Let us see what is word vector in a very simple example. I have a two vector. So, the vector size is two, not a 300. The dimension of the vector is 2. I have a vector called pen and a pencil. So, if the vector size is 2, the dimension is only 2. Pen is 3.2, let us 0, 1, 2, 3, 4. This is 1, this is 2, this is 3. So, pen is 3.2 and 2.1. So, pen is something here. This is pen. Pen vector is from the origin, this vector is x 0 to 1. And pencil is 3.4 somewhere here and 2.13, almost same, almost same. Now, how closely pen and pencil added? Maybe pen and pencil are stationary. It is a part of a geometry box or part of a stationary, part of people use it in schools and everything. So, these words might be occurring together too much in a given content or given article. So, that is why it is also like all coming together, something like that. Now, pen and pencil are very close to each other. So, can you know also distance between pen and pencil? Can you identify distance? We did once to identify the distance between two vectors. That is how do you do that? We did that using Euclidean distance. So, this Euclidean distance actually does this. So, what it does it? So, now we have a two vectors. Let us put it, let us put this complete graph. So, this Euclidean distance between these two vectors is identified. How it identifies? If you have this origin, you know the x 1 value. The difference between x 1 tells you this particular thing. Difference between y tells you this. If you have known this value by Pythagoras theorem, you know what is the value of this particular value, distance between x, those two vectors. It is very simple. So, you can use the Euclidean vector to identify to distance between these two words. So, let us say there is a word. How closely is associated pen and pencil? So, I want to give the word called pencil. And I asked the system, can you bring up all the words related to pencil? It can be pen, eraser. It can find all the related, how it finds related word, what it does? It takes the vector, it finds the nearest vectors, nearest vectors, computes the distance between all the other vectors, finds the nearest vector, all these nearest vectors has been given. That is why exactly this distance is computed. Let us look at one tool. If you want to know more about that, please check the Thomas Mikhalos code here, code.google.com. So, you have a code for you to use it automatically. Let us look at that. Let us go back to just go to Google and type the word embedding video. So, word embedding demo and just take the first one, the lot of demos checking. When you go there, check the English one. So, they use English, Google News, do not worry about negative, skip gram and all, do not worry if you want to know more. And it is a 300 dimension word. So, the dimension is 300. They use the Google News articles to create this statement. So, I want to use the word pencil. So, let us see if I want to word the pencil, show nearest words to this pencil and I want top n or top 1 or 2 something. So, if you give this pencils, crayon, eraser, crayons, notepad, pen, scribbling, scribbling, paintbrush, scribbles. So, you see, if you give a pencil, all the related words are coming together. You can see the usefulness of this, right. So, some people are writing of works, essays, they might be using the related words. So, if you use the similarity, we are not able to find it. Now, by using this, someone might write a crayon, someone write a pencil, someone might use a pen. So, by using these words, you can able to identify the similar words and you might give a better grading or something like that. The most, more important thing here, even not just finding the equal word. It is also very important here. It is, let us see, let us see that, if you want to type some other words, type India, Delhi or any other word in academic word or technology word is type it. So, I want to find Delhi is for India. What is Paris is for? Delhi is capital of India. What is Paris capital of? You just want to show top words. France, right. How it identified? That is again simple word vector. If once you have the word vector, you can use it. I will show you how it identified. Morocco is also speaks French. So, Morocco might consider Paris, because a lot of people travel Morocco, but I am not sure why Morocco, but the right word is France. If Delhi is the capital of India, what country the Paris is capital of? It is France, right. It is very simple. Let us go and look at that how it identified. What happened here is very simple. Once you have the logic, once you have the word vector, it is very simple to do. First, what it did? It is a 300 dimension vector. It is not 2 dimension vector. It is too big. I am just not able, nobody can imagine it. So, let us see, there is a Delhi, there is something called India. There is a word called India. It is not something called India. So, there is a word India and there is a word Delhi. There are two vectors, just simple words, the 300 dimension vectors. Now, I identify the distance between these two. The distance can be say 0.9 something like very closely associated, something like that. So, now if I, so what happens is now, if I apply Paris, the relationship between Delhi and India, this relation this 0.9 tells you whether it is a capital, whether it is a very enemy or this is a another country or something like that. This number tells you that. What I have to do? I have to identify the vector Paris plus the distance between Delhi and if I apply the distance that is a 0.9, the next word will be France. The idea here is if I have another somewhere not near Delhi and India, another two vectors Paris and France, the relationship between these two will be same as Delhi and India. That is it. So, check this particular tool and explore it. This is the latest, let us say since last 78 years. In NLP, this has changed after 2014, everything we looked at NLP as a bag of words, all these words considered all are gone. Now, it is only vectors. No, instead of million vectors, no need of million features. Just we need a few features from the 300 words. You have to combine add, average. You can compare two words. You can compare two sentences if you think of how to do it. So, this is the latest in NLP. I just wanted to inform you the one tool of NLP, also the latest in NLP in this video. So, that is all about word vectors and NLP tools. Thank you.