 Shweta here, XSSME, X Red Hatter, right now a student studying NLP, Artificial Intelligence from UK's University of Sheffield, and I was invited to give a short presentation. That's my area of specialization. I'm doing my dissertation on that. So, let's start with NLP. NLP is all the hype these days with chat GPT and all. All of us must have used it at some point or the other. I use it often for my assignments. So, let's move forward. This session we will be covering introduction to NLP, what NLP is all about, why NLP, what's the significance in like software, and what's the significance in open source. Then we have the hyped about the market topic, chat GPT. Why exactly chat GPT? I'm not all for chat GPT, so I will be displaying the cons of chat GPT over here instead of the pros, because all of us know the pros we have been using it. Then I will be showcasing some uses of NLP in the industry, what all use cases we can have. Since this is a very short session, I won't be able to showcase any demos, but I do have GitHub links for the code that I have already developed for a few projects of mine working on. In your free time, please feel free to take a look at that code and check out what exactly is NLP all about. Before diving into NLP, let me give you the basics of machine learning, because NLP uses machine learning extensively. Machine learning is all about having your systems make decisions for you without explicitly programming it. That's basically a main definition for it. We have two categories, supervised and unsupervised. Supervised is like when we use it with labeled data. We have index data that we use it with in the model, and there is unsupervised where there is no label for the data. That's when we call it unsupervised learning. When it comes to NLP, NLP is the field where we have systems, computer systems understand human language, understand speech, human speech, create language, and then analyze a huge amount of given text and make some sense of it. Get the context of what exactly is being said there and then try to process it and give us outputs, whatever the context is. NLP is a very, very, very vast field and the only four things that we would be concentrating on in this session, which are the major areas where it has been focused upon, the research has been focused upon. So those four things are, the first one is language understanding. All of us must have used it, Siri, Alexa, voice assistants. Those are the devices that actually interpret your human language and answer you accordingly or provide your outputs accordingly. Then we have text analysis and information retrieval. So for this, this actually is part of sentiment analysis and all of us use Netflix, right? Netflix. So the recommendations that we have on Netflix, that is based upon sentiment analysis algorithm and that's how the more data you give it, the more recommendation, like the more plus signs, the thumbs up signs you give it, for the content you are watching, the better it gets with the prediction. So that's how the sentiment analysis part works for Netflix. And then we have natural language generation. This is like chat GPT, all the chat bots, all the virtual assistants that we might have seen any, like even on bank websites when you go and ask them that what is the amount that I've spent this month and that chat bot will showcase to you that, yeah, this is the amount that this current month we have been, you know, expending or what exactly we have been doing with our account statements and stuff, it will showcase you all the details. So that is natural language generation because we call it natural language generation because it not only interprets what you are saying, but it also gives you an output in a human understandable way. So it's like a human language thing. That's why it's called natural language generation. Humans put the input in a human language form and then you can understand the human language in the same output. And future innovations, I would be talking about what all social implications like whatever social contributions NLP has had and how we are implementing it for social causes. So those are like the dissertation topics that I'm working on, that will be covered in the last slide. So why is NLP automation significant? Because it can help you understand human language. It helps you bridge the gap between human language, human speech and our automation processes and our machines. So that way it actually helps you beautify, I would say, customer experiences. It helps you bridge the gap between customers and humans and then it helps you make data-driven decisions like if you have a huge amount of data, you give it to an NLP model, it will make some sense of it and give you an output that this is the prediction for this given output that way. So that industry, it actually helps you realize even from feedbacks, even from huge amount of texts that, yeah, this is what the customer is actually trying to ask or this is what an open source developer is actually trying to achieve by doing this, by doing this documentation or writing this amount of data or like a text saying that this is what this code does. So it will be able to make sense of it. Okay, the examples now. So this comes from my experience working as a subject matter expert and also open source developer. So please take whatever resonates with you over here. So the first one is machine translation. Machine translation is like, supposedly when we were working on cases and there was a case from some region, let's say China or Korea and we don't understand the language, we would have to wait for the other region to come and then the engineer has to take up the case because there's always sensitive data on the cases. We cannot just put it on Google translator. That's not how it works. You can't do that. So that's why we would have to wait for the other region to come up. So in that way, we could have an API of our own. Red Hat could have an API of their own, an open source API maybe that would help us analyze these things and we wouldn't have to wait for another human to come up and work on the cases. And there is also like for open source, I would say localization of software as in let's say if there is a user interface, usually they do have it for some very good traffic websites when it comes to like train bookings, flight bookings, they do have it. But we should have it for more websites because English is the commonly used language but it should not be a barrier in any way. That's why if there is like a website that has a flight booking thing and then somebody is trying to work on it and they can't analyze the buttons or can't understand what exactly is happening there. So in those cases there should be like an NLP API. We could develop one. It's easy. It takes effort but then it would be like a long-term investment. I would say that that would help you like go through a lot of interfaces converting the language in just a GIF. Then we have knowledge base automation. We do have like our KCS. All of us must have used it at some point of time. So for KCS what we do is like we have a generalized search. We do a Google search as well for our KCS articles instead of that we could have like for each product company could have their own KCS base where you make the search. It should not be a generalized thing. You just use a few words keywords and then the NLP algorithm that we have developed, the NLP model that we have developed that would give you the predictions of what exactly those articles should be that you are looking for instead of going to Google and doing the all generalized search. It would be faster. It would be more efficient. Then we have social media monitoring. This means trends like let's say there was a lot 4G trend that happened. A lot 4G vulnerability that happened a few years back and we were flabbergasted with a lot of cases over there and then each customer was asking the same question of how exactly can we resolve it. How exactly can we you know how exactly our product is being impacted by this. So for those things if we have those social media trends with us we would be able to allocate more resources accordingly and take steps whatever are like required at that point of time. Then we have virtual assistants. This could help us like for developers when we are doing open source contributions. If it's a new GitHub you know code then it becomes really difficult for you to debug instantly and understand which classes are being used in this code. So for that we could have virtual assistants guiding you to you know go through the module or find the module exactly where the package the module wherever the code could be. So it won't be a 100% accuracy thing because NLP doesn't provide 100% accuracy. It's around 80%, 70% depends on the model but that way we would be able to analyze the model. Then we have descriptive analytics and automatic insights. I'll go through them really fast. So descriptive analytics we could have like we have heard about Sonar. Have we heard about Sonar code quality tool. So those are like rule based things. Static rule based things. So they do not have any dynamic things going on in there. It's just like a set of rules that have been written that analyzes the code. At times it's not, I would say not at times I would say 50% of the time it does make a mistake in indicating that your code is correct or wrong. Sometimes those APIs are necessary and it indicates no this should not have been here. So that way we could have dynamic programming using NLP developing a code based quality based tool that will help us analyze our code better. And automatic insights it's again like feedback analysis and rating analysis whatever is given on cases or maybe on releases what exactly the customer is expecting or maybe the open source people are talking about. What should be the newer trend or what should be a newer feature that should be added. Those things we could take into account. Now moving on to GPT. So GPT is like generative pre-trained transformer. It's a huge topic and it's very difficult to explain it. But transformers are something that are better than neural network. If you have heard about neural network transformers are stuck ahead. And it is based on sequence to sequence modeling it takes larger contexts into account. That's all I can explain right now about it because it's really, really huge. And the cons that I have found while working on GPT is there is no understanding whatsoever. The model never understands what I'm trying to say. It doesn't get the context most of the times. Second, if you put like a question different phrase it won't be able to make sense of it. If you like even use two or the inner different sequence it won't be able to understand it or give you an answer accordingly. You have to be grammatically correct to get an answer out of it. Then you have bias. So if as a user I am adding something to the system the system learns right every time you give an input to it. You type something inside it, it learns from that input. So if I have bias as a user, if I am biased to a certain political let's say agenda and I'm adding that to the system then it will give me output with bias. And every time you access the system again you will keep on seeing bias because that's how the generative model works. So for each user it's different but if you as a user are trying it again and again you will be able to see that bias. You could try it. You will be able to see it instantly. And then we have no real time knowledge. Of course it doesn't have any real time knowledge. It's not a human being. And there is no reason or explanation that's something that we have been working on in an LP research areas because reasoning and explanation are really really important at times because if like you are giving an answer to someone and you cannot justify why that answer is that way it doesn't make sense. So that's a very vast area of research in an LP and then misinformation and inaccurate responses any of you have ever encountered any inaccurate response. So for me any mathematical question you input any mathematical question because I am a student right now I'm working on multiple assignments. It always gives me an incorrect answer always. It's not like even 80% 70% it's 100% of the time. So I would suggest never use it for mathematical equations. Don't do that because it's still not developed for that kind of inputs. Ethical considerations whatever we are adding to it it's getting uploaded to the cloud. It's getting uploaded to the open AI cloud over there and it has all the data that we are adding to it. So of course it's like an ethical consideration because we are adding sensitive information at time and then it becomes very very very difficult to understand where that information came from and hackers can use it if they have access to the cloud. Now the last part of the presentation this is like the dissertations that I am working on. The social cost things from NLP. So there are two things that I am working on. The first one is EDoS EOSD whatever you call it. Explainable online sexism detection. Sexism has become very very very prominent on Twitter and Facebook and all those platforms. So we do work on I have been working on that and we try to identify like from a given text we try to identify whether it's sexism or not. What kind of sexism is there? Who is accountable for that sexism? Those kind of things we try to find using our research using our model. The second one is AI judge. This is a very I would say like that's like a 50-50 part of it that people would accept it. There are a lot of people who are against it because they think having something like an AI judge is a social implication that people would start losing jobs because if you have an AI judge what's the point of having a human judge and why would there be lawyers at all? What would be the case? So AI judge also is like a very interesting research area that we are working on currently. So what we do in these things is like we have something called trained validation and test. It's like for all ML algorithms. So you have if you have like 100 test cases, 100 cases that are given to you you use 80 as training, 10 as validation, 10 as test and then when you have real-time situations you give those situations to your model and then it will give you a prediction. That's how it works for AI judge or be it for EDOS the sexism part. So that's the research area that we are working on currently. I will be adding those GitHub links on the slides if any of you want to like take a look at the code. It has sentiment analysis it has an LP used pretty extensively and bias and all those things. So yeah you could take a look at it. We have worked on BERT, Roberta and what's the other one? Yeah the other forms of BERT itself because BERT is like the best one right now in an LP. So those are the algorithms that we are currently using. So that's it guys thank you. Sorry for the delay.