 Okay, now going live. Okay, so hello everyone, I'm Sanjay Gupta. I welcome you on Sanjay Gupta Tech School. So today we have one more session on AI. So basically, as you all know, like we are having this Salesforce AI associate bootcamp where different, different topics are being covered. And to cover one more topic that is natural language processing, that is NLP, I have Nikita with me. So welcome Nikita on the channel. And she will be sharing insights like what NLP is and what kind of questions you will be facing in the certification, right? So all the theoretical and conceptual things she will be discussing. Okay, so I think we can start with a session. Over to you. So today we are going to have this understanding about natural language processing. There are three words into it. One is natural and the other one is language and the third word is processing. Natural means something which exists pretty usually. And here we are talking about the human language which exists since years and ages. And it's very common language. It's something which only humans would understand obviously. So we need to put in some efforts in order to make our computer learn so and then process and then come back with an outcome. So this is what we are going to learn today that how exactly a computer would be capacitated enough would be enabled enough in order to understand the usual basic human language. That is what the gist of natural language processing is. In a bit. So let's see what definition of natural language processing do we have. So NLP is an area of research in the computer science and AI concerned with processing natural language such as English or any other language. It is vastly used everywhere in the world. So we do not have to only comply with English. We can comply with any language that we want. Your language, whatever you speak would be translated, would be entered into the NLP model. And then it will process it. And then it is going to give you the output after lot of parsing and techniques that it is going to understand your language by. Right. So this processing generally involves translating natural language into data. You should be very much understanding that whatever we write, whatever we say verbally is understood in a format. And that format is not the linguistic format. Obviously, if we are entering something, there not be a possibility that the model understand it in that way. It has to segment it. It has to take those characters, words, phrases, paragraphs, and then comprehensions into a totally different way. And that's how it is going to process it up. So this is what the whole Tarayan of NLP looks like. Now, it started in 1950 with the Allenturing test. And I have discussed this Allenturing test in my Day Zero channel, sorry, Day Zero video, where you can go and check what exactly Allenturing did in 1950. And after that, the first use case of natural language processing came into play, which was language translation from French to Spanish, from English to Spanish, from, you know, Russian to any other language. So these are the first use cases of natural language processing where I can type anything and it is going to give me a translated version of whichever language I want to enter into it, right? So let's just have this clue about this last paragraph that I was studying. This processing generally involves translating natural language into data numbers that a computer can use to learn about the world. And this understanding of the world sometimes is used to generate natural language text that reflects that understanding. So if I take the example of Chad GPD, I am entering my text. So this text can be anything, right? And if I'm going to enter into it, like give me the details about natural language processing, give me the details about wildlife. So it is going to understand it, process it. Now, processing, it's not just a word. There are various techniques involved in this processing, which we will learn extensively in near future. And after that, it is going to give you the result, character by character or word by word or line by line. So when you type something in Chad GPD, you must have seen that the output is written in such a way that character by character or words by word come in and you just watch it. It looks like it is typing those words. It's not like a paragraph is just thrown on your face. It's not like that. It keeps on typing, typing, typing, and that's how the character conversion is coming into play. So this is what the first slide or line would explain to you. Then, of course, we have sub-areas of NLP. Now, what actually holds inside this NLP is natural language understanding and natural language generation. These are two parameters, or you can say the two impactful configurations inside this NLP, which enable this NLP to actually result into human language, where it is going to understand and it is going to generate an output with respect to whatever it has understood in a way. Well, whatever we speak comes up with a vocabulary. And these are some of the list, I should say. This is a list which I have enlisted over here where you can see that natural language encompasses what all. How is a natural language made? Any language is made up of a definite vocabulary. If you do not have the words to say your feelings, say your sentiments, then you are unable to communicate. For any communication, you need your words. You need a different set of vocabulary. Now, be it any language, it's not, of course, about English. It can be about any language that you are comfortable in. So vocabulary is one of the things. Then grammatical errors, there might be some grammatical errors when you put in or plug in or input your language. So this grammar has to be taken into consideration that how you are speaking, what you're speaking, and is it coming out with any sense? Has it got any sense? And whenever we say that he asks me or he asks me, it has a viable difference. So you should know that grammar is going to play a lot of role in whatever you want to insist to the model and get the output for. Then you have the syntax. Now you should understand that if I am writing in a thorough syntax, the computer would also result into its syntax. So initially, it was a problem that my words would be in certain manner, and then the output would just generate the output in a manner where the synchronization was not done. So these were the drawbacks which were certainly worked on since 50 years NLP is into the working and we have been researching so much more. Now the results are pretty optimized and we can get the output in a language and something which is very making sense. So this is what a syntax important is. Then you have semantic means the meaning of the words, the phrases, and sentences. Pragmatics of the sentence, again, the context and the intent behind, that means reading in between the nine comes up with a huge importance. Your model should be able to understand what are you trying to say rather than just emphasizing on what text you have input. So this is pragmatics where your sentiments and your intent of saying anything, delivering anything is taken into consideration. Then you go to discourse and dialogue. Of course, how do you say that matters again? So units larger than a single phrase or sentence, including documents and conversation, all of them are taken into consideration. Then you have phonetics and phonology. So this means that the sound we make when we communicate. For example, we have Alexa and Siri. So when I talk to Siri, I need to be very clear about the fact that you know, play this song. If I'm not clearly delivering my idea of playing a song, Siri would say, like Siri either would generate an outcome which is based on what it has heard, or at least it would not be correct if my communication has gotten disrupted with any of the ill-fanatics if I've used any. Correct. So this is how your Siri and Alexa work. They need to have concise, precise verbal communication and if not so, they would either deny the results or would generate the results which are not so useful for you. Last, we have morphology. That means how parts of the words can be combined or un-combined in the, un-combined to make the new words. That's about how the natural language has got its own instincts and how it should be delivered in order to get the desired outputs. Then we hop onto the area where we are now going to realize that how exactly is NLP working? How in the field is it actually delivering the results which it is delivering? So NLP uses algorithms and methods like LLM, statistical models, machine learning, deep learning and rule-based systems to process and analyze your text. These techniques are called parsing. This parsing is very, very important for you guys in order to clear the AI associate exam. There'll be various questions on this parsing. They'll be asking you what is segmentation? What is semantics? What is lemmatization? All these techniques will be asked and you should be knowing everything in a briefed or concise manner. Otherwise, you'll lose marks of course and might not get the certificate. So these techniques are called parsing involving breaking down the text into chunks. Now, the model is not efficient enough to directly understand the paragraph, whatever it is. Of course, if I am reading anything, I'm also either understanding it by the context or I'm understanding it by the vocabulary. So I have got a sense of understanding. So the way I am understanding, this might not be the way a computer wants to understand. So there are various tokenized way in which the computer or model would understand. Now, the way you break the paragraph, the way you segment your paragraph or line that this segmentation, lemmatization, all these processes are imposed on the text that you put in. So these techniques you have to be very careful about. Learning so that you are able to gain your desired achievements. Sorry, I think my speed. Yeah, I think you can again slide show. Yeah. So maybe you can jump on to the next slide and then slide show. My writing path was actually connected before making some lines. Okay, sorry. No problem. So here we have parsing, which is divided into two segments. One is syntactic parsing and the other one is semantic parsing. Syntactic parsing looks at the general grammatical syntax rules, your words, how are they phrased, all stuff. Symantec is little sentiment analysis. You're in between the lines and sentences. That's what semantic parsing would look like. So these are two types of parsing that you would majorly see in the Salesforce strategy, Salesforce trial head program. So here you go with the first thing, which is parsing. What does syntactic parsing look like? So you can see that syntactic parsing is where elements of natural language are analyzed to identify the underlying grammatical structure. So when you're talking about syntactic parsing, you're talking about general grammatical structure, how it has to be broken down into different segments. And then it should be inferred in a way by the model and then interpreted and then the output. So you can see that the sentence is on the top. It has been broken down into the noun phrase, verb phrase. The noun phrase is then characterized as, it is either proper noun or what kind of noun is there. So that type of phrase it would be. So proper noun phrase. And you know that the proper nouns are all the names that we have. So if it would have been a city name or anything like that, it would have been characterized in the proper noun name. Then verb phrase may you have got, it is characterized into the verb and other noun phrase that if it has any. So that verb is eight. Then finally, we have a noun phrase which is further characterized into determiners and noun. Noun is another one, which is apple and determiner is the, and so this is how the characterization of a statement or line looks like when you talk about parsing. So these are the two types of parsing that you see in. So I'll head program. Yeah, here just want to add one thing. Why, as you mentioned, like we need to know the specific things about these techniques, right? So why, so why these are important because whenever anyone is appearing in the exam, so there will be like three options available and it is too confusing like which one to select. It seems like every option is correct. So if you know the specific meaning or explanation of a particular topic, so it will prepare you to identify the correct option immediately as you read the question, right? So that's why if you go through all the session in detail and try to understand the concept, then I don't think you need to, you don't need to use any sort of dumps which are popular. So just try to understand the concept so that you clear the certification and whenever you are going for an interview, there also you can explain those topics in detail. Okay, yeah, please go ahead. Correct, okay. So now we are going to characterize or classify the syntactic parsing as well into its various different different sort of techniques that it has been classified into. The first one is segmentation. Then we have tokenization. Mostly the question is asked between stemming and lemmatization. Then you have parts of speech tagging and named entry recognition. The last two are simple, pretty simple, where you know you can just have a read and you get get done with it. It's there in the trial head program. However, here we have this holistic application of segmentation, tokenization, stemming and lemmatization. So we'll look at segmentation today, where we have larger texts are divided into smaller and meaningful chunks. Segmentation is a process where you know, put into pieces some sort of text that you have. So it could be either large text or it could be even a word, even a sentence, anything that has to be segmented into its stems and prefixes, suffixes. So these things are grammatical things if you have ever heard about prefix. Prefix is what is attached at the beginning. Suffixes is what follows. So these are two words that you should be clear about that segmentation does. So segmentation usually occurs at the end of sentences at punctuation marks to help organize the text for further analysis. So this is what segmentation does. It makes the larger chunks smaller and larger sentences smaller by putting them into pieces and characterizing them with their respective things. Like if it is at the beginning, it's called prefix. If it is in the end, it's called the suffix. The next is called tokenization. Tokenization is again, it is a subset of segmentation only. But as you can see that this, I have made this image ready where you have been given, this is a sample. And if I would tokenize it, it would result into different words. This is a sample. So this is how the chunks are broken into a format and then stored into the model. So it is a preprocessing step in which a long string of text is broken down into smaller units called token. Tokens include words, characters, and subwords always. It is not going to characterize a word into suffix or prefix, but it is going to break it into the chunk. I mean, the chunk is broken down into the small words. So they are the building blocks of natural language processing and most NLP models process raw text on the token level. So now what after this? After you have broken this worded format, after you have gotten this worded format, after this, you will learn a vector analysis where these words will be stored in the form of matrices and all stuff that we are going to do in the mathematics part of the NLP later on. So here up till now, it's just about theoretical aspect of tokenization where you can study that the sentence is broken down into small words. This is about tokenization. Yeah, I remember like when I started learning Java and taught as well. So in Java also this tokenization concept is available and in most of the programming languages, like we store this data in form of string and we have some really defined methods through that we can divide different words into different tokens. So yeah, it is good like it is used here also. So those who are from programming background watching this session, they can relate like how tokenization is already done in the programming and that is included here for pre-processing, right? Yes, just the way compiler was there in any programming language, we have an integrator or a compiler, right? So they change the low-level language where we input into the assembly language or high-level language, right? I think that's what the thing is. So this is how the compiler or interpreter they get and do the desired things and then they store into and then they perform some actions. So that's how the... I think sorry, the high-level language is converted into the low-level language. And then finally the... You explained it in reverse, yeah. Yeah, yeah, yeah. So yeah, I mean this is what happens there when we interact through any programming language when we interact to any of the... like if it is Python ID or if it is C programming ID, whichever it is so if you have to interact with that you have to have a compiler or interpreter. Now here we don't have that case. It's just a human language. And human language has to be made understood by... to the computer. So how do we do that? This is the process of, you know, doing such things. Yeah. Then you have stemming. Stemming and lamentation, again, very, very important. We have a quiz question based on this so you focus over here. The words are reduced in the root form. They're also called the stem. So you can see that studies and studying both have got different roots. One is study, one is study, right? So this is what stemming does. It doesn't grab any sort of grammatical technicality. However, it only takes a part of the word out which it thinks that it is common and then, you know, gives you the desired part of the word but it doesn't have to be very grammatically apt. This is what the difference between lamentation and stemming is. Lamentation is a pretty forward technique. It is little costlier also and it is updated version of stemming, as also we can say. So for example, breaking, breaks or unbreakable are all reduced to the word break. So stemming helps to reduce the variations of words, forms but depending on the context it may not lead to the most accurate stem. So look at the two examples that use stemming. So here you can see that stemming is how we have used and it is going to remove the suffixes out here, right? And study and study both are taken as the roots of the word. Now how is it different from lamentation? Let's see that. Lamentation is a more sophisticated technique that uses morphological analysis to find the base form of the word. It is also called lemma. Lemma is the base form of the word. Lamentation reduces the words to their root and also takes into account the part of speech. So part of speech means whether it is a noun, whether it is a verb, all the parts of speech like interjection, preposition, these are all parts of speech. So these are taken into account to arrive at much more valid root or lemma. So both as you can see studies and studying have gotten a single root study which is a more apt root rather than the two roots that we got for stemming, S-T-O-D-Y and S-T-O-D-I. So what is the difference between stemming and lamentation? You can see that stemming is resulting into one root which is Chang which is not of any meaning over here from the words like change, changing, changes, changed, changer. Then you have lamentation where it is change, changing, changes. All these words are taken into consideration and a single word which also makes sense has been taken out. It's called change. So lamentation is a pretty forward technique and it is useful as well after the stemming, preprocessing technique. If we can summarize stemming, we just remove the suffixes, right? And in lamentation instead of removing, along with removing suffixes, there's a word created which makes more sense of those combinations. I think with this diagram, it is very easy to remember the differences between stemming and lamentation. Correct. So if I would take you now to the parsing techniques which are related over there in the Salesforce trial and have a quiz over there, I think now you will be able to take it also. So let me share the screen. Yeah, so guys, here you can see this trailhead module. If you can scroll up. Yeah, one second. Here it is. Can you go back one more time? Yeah, one more time. Yeah, here it is. So we have covered... Yeah, go ahead. We have covered generative AI basics in the previous lectures. Now we are on to this natural language processing basics. After that we will go on to the data fundamentals for AI. So this is the last lecture for the NLP that we are going to hit today. Yeah, so here we have two modules, right? So have you covered both the modules? No, we are covering the second module. The previous one had just got this allenturing test which I have already discussed in the previous lecture. Okay, okay. So if you can come over here and see which NLP technique uses the parts of speech to more accurately find out the root of the word. So this is how the questions can come. Yeah, I can... So you can... Okay, you decide. Yeah, I think it is de-limitization. Then you have what is a term for finding the underlying structure of text in NLP? It cannot be parts of speech because that's a theoretical thing. It's not a technical thing. Hence we can escape it. Sentiment also, it's the thing that, you know, how humans have got the emotions into them and how they are depicting their emotions. So it's their sentiment. So again, these two things cannot be so. Morphology is a technique of... Morphology is where you combine and un-combine words to generate new words. So again, this is also not going to be there. Only underline... What is the term for finding the underlying structure would be parsing. Okay. That's what we have studied now. And this is how we can go... Let's see. Yes. Yeah, both were correct. If you can cover the first one also. Can we have it again? I'll go, I'll go, I'll go. Yeah. I think it is there. If you scroll down, so... Okay. So you have natural language processing over here. Then... That I have already covered. Then you have discussed everyday uses of the NLP. Uses we are going to take right now. And then the use cases of NLP will take. Then you have explained how it has evolved since 1950. So you can see that here. There's a brief story about Alan Turing. And we have already covered this experiment in our day zero. I started with this experiment in the day zero where artificial intelligence was introduced. So they can watch the day zero where it has been in a glance explained very well. Yeah. I think those who are interested and joined in between, they can just... So if you're watching this bootcamp session for the first time. So like we will request if you can go through from the day zero so that you can relate all the topics. Right? So having AI associate certification is becoming good to have or nice to have. And like if you don't like reading theoretical things, you want to listen to someone who is explaining the topics. So I think this channel is the first channel in the entire Salesforce ecosystem where you will find AI related sessions totally free of cost. So just utilize all those sessions step by step so that you can learn things quickly. We also had discussed about what is natural language. Can you answer what is the natural language? According to you, what would it be? I think number two, the way humans communicate. Right. And then if we go on to the next one, in what ways have neural networks impacted the NLP? A and B. So answer will be D. I think we should be correct. Yep. Okay. So I think we are good with this topic now. And anything else that you want to cover? Yes, of course. I'll just end it on the use cases. That's what we ended on. I think a couple of slides and I'll just leave. Okay. Yeah, we have time. There are two more slides, please. Yep. We have time so you can go ahead. Okay. Now you have the use cases of NLP. So the first one is email filters. So email filters means whenever there is a span sort of an email. There are a lot of shopping sites that are constantly sending you these messages through the email that you can buy this, you can buy that, and we have got good offer. So your email filters, what they will do is recognize the certain pattern in those texts, which are spams or certain words, certain phrases, certain texts. That's how the email filters will work and filter out all the spams so that you are only having a very scrutinized number of mails in your mailbox. Rest other going to the spam box. So that's why sometimes what happens is even sometimes some mails which are of importance, so they ask you to check the spam mailbox that it might be a possibility that 10% doubt can be there that the email that they have sent is sent into the spam box. So that's the email filter thing. So then you have text prediction. You are sharing trailhead page so you want to share the slide because otherwise audience will think session is ended and they will think what she is talking about. Here you have use cases of NLP. So the first one is the email filters. In email filters, what do we have? I was explaining you the email filters. So in email filters, we have a thing called spam analysis. Now this is not a written thing over there but what through email filters we have is we find out that what kind of text, what kind of words are relating to the spam. And as I mentioned that whenever any shopping website is constantly sending you mails, then it is automatically sent to the different mailbox which is spam box. So that's how they categorize the spam either on the words, text, phrases, these sort of things are analyzed with the email filters. So I have an explanation for it. The first use case of NLP is the email filtering. It began with the spam filters uncovering certain words or phrases that signal a spam message. So how do you signal the spam message by uncovering certain words? Then you have Gmail email classification. So it recognizes that mails belong to one of the three categories. One is primary, social or promotions based on their contents for all Gmail users. This keeps your inbox to a manageable size with important relevant emails you wish to review and respond to quickly. This is what email filters would do. Then you have the next thing which is very, very important. So I have made these snapshots also because I tried to exemplify the NLP text prediction thing. So if you can see, I have written a mail to randomademail.com where I have typed some certain text over here. So this mail is to remind you of the negotiation that we have had in R. And then while I was writing the L, they already predicted the last meeting. So if I would do when you see that it is in the gray format, which is something that I have not written, but the email or the email has predicted the output. So this is what text prediction looks like. Then you have the next thing which is looking forward to yours. Now if I have written looking forward to yours, then I have written this positive. I have not written the positive thing and they have understood it by themselves that this would be a probable output here. So this is how text prediction works. And the final thing, yours and after writing yours, they already said or they predicted that sincerely would come. So this is how the email in email the text prediction works like. Last we have the smart assistant as a use cases of NLP, Apple Siri and Amazon's Alexa. I have discussed it numerous number of times and this is again probably I'll say that whenever you make some sound that you know this is what you want. And when you communicate with them verbally, you have to be pretty clear and then they're going to analyze those patterns in whatever you're saying. So when you play that play this song, play that song, they're going to do so. However, if you would ask them to play your favorite song, they might be a little skeptical because they don't know what your favorite song is. So either you make a list of your favorite song and input that or you will have no output for it because what's your favorite song. They can recommend you on the basis of whatever you have heard many a number of times, but it's not always going to be a spot answer that this is your favorite song. So they can recommend you what your favorite song can be on the basis of what all songs have you heard before. But in order to have a spot on answer, you have to tell them that this is your favorite answer and this is what they have to play when you ask for a favorite song. And how do we talk to Siri? It's like if you want to ask anything, we just activate Siri by saying, hey Siri, ask a question. And then she understands what we said and responds in an relevant answer based on the context. So this is how NLP works and that's it for the day. Okay, so I think you covered a lot and so in this session like NLP is all covered, nothing is left, right? Okay, and what will be the next topic? The next topic will be data, you know, what is the significance of data in the AI? Okay, that encompasses I get 38% of the program. Right, we have four modules. So first we already completed fundamentals of AI. I think we are covering the second one right now and generative AI, predictive AI and NLP. These three topics we already covered and data with AI, I think it is the fourth module, right? Okay, yep. So I think those who are following all the sessions, they are understanding. And if you keep following all the sessions, I'm sure by following only these sessions, you will be able to crack your AI associate certification. So I just want to give a thank to Nikita for preparing all the content and sharing all the knowledge freely with the community. So thanks for that and I know like you are planning to do one more bootcamp that will be more practical oriented, right? So here we are just discussing all the theoretical concepts, but in that bootcamp, she will be first explaining what Python is and how we can use Python to build models in AI, right? So we are planning that as well. So maybe this month we will be completing this bootcamp first so that at least you can target AI associate certification. And then we will plan that bootcamp so that on broad level you can understand how actually something can be built in AI, right? Okay, so that's it for today. Thank you for joining this session. And if you're watching the recording, thanks to you as well. So keep following all the sessions if you want to clear your AI associate certification. Okay, thank you. Thank you, everyone. Thank you, Nikita. So we'll see you in the next session, guys.