 Okay. Hello, everyone. I'm Sanjay Gupta. Welcome back on Sanjay Gupta Tech School. So as you all know, like, last week, we started this AI bootcamp. And I have Nikita with me. So welcome, Nikita. First of all, on the channel. So Nikita, Nikita explained you what AI is in brief. So in today's session, she will be explaining you the concepts in little bit detail, right? So this is day one of day one session of AI bootcamp. So initially, sessions will be like, explaining you the concept of AI. So beer with us, right? So you can understand all the concepts. And along with the concepts, she will be explaining you different algorithms, right? So that you can understand, like, how algorithms actually works in AI. Okay. So with this note, I just want to jump onto the next slide. So if you want to know about Nikita, so you can go through with this slide. So she's active in the area of data science, AI, ML, and she has done lots of research and all all those research is going to share with you in different different sessions, right? So she's basically helping all the students, freshers and professionals so that they can ram their career and learn algorithms and AI right in detail. So with this note, I just pass it on to Nikita so that she can start our day one session and you can understand what AI actually is. So over to you. Thank you, sir. So in the past session, we had discussed about three things. And that was in a two sort of nutshell that you know what is artificial intelligence, what is machine learning, and how is machine learning the subset of artificial intelligence. Then we went on to the deep learning and we just had a slight note of deep learning and then I told you about how they are the subsets of each other. So this was a very nutshell session, the previous one. However, this one would be slightly into the machine learning thing that we are going to learn something about data, because that's the lifeblood, right? So we need to learn about it. See how many types of data they then how am I going to make my machine or enable my machine to learn from this data, right? If I'm going to show the data, how is this machine going to cope up with it, right? And how am I going to define the learning or civilly some intelligence into my machine so that it can become independent of human intervention and we do not need to intervene in how processes this works, right? So let us see how and what types of data do we have actually in our horizon. So data has been broadly classified into categorical and numerical data, right? One first type of data is the categorical one and the other one is the numerical one. And categorical has been further divided into nominal and ordinal. So you might need to refresh these terms because you have studied this in your classes of 10th, 11th, if you have had statistics as one of the subjects. So this is just one slide of statistics. However, if you have to dive into data science, you have to have the knowledge of statistics for shock. So the data, as I told you, has been classified into nominal and ordinal. So nominal data can be the sort of colors that we have because we cannot order them. It cannot be like first color is red, second color is blue, third color is yellow. So this is not what we can do. So it's a data but we cannot order it up. So the example of nominal sort of data is just colors. Then you have ordinal data, the one which can be ordered. For example, you have got certain designations like this one is a super junior designations, then you have mid designations, then you have senior designations. Or when you categorize the data into low, high or medium, then these sort of data are called ordinal data. Now we move on to the more important one because that's what you're going to use in the ML regression things. So discrete and continuous data, which is numerical in nature. Since the nature of the data is numerical, you will see numbers but one of them is discrete. That is only 2, 3, 5, 7, like this one. As you can see number of children in the family, this is a type of discrete data. However, the continuous one is when you also take decimals into place. So 2, 2.4, 50, 25, 25.7 and so this whole data, like for example, if you have somebody's weight into concentration, 50 kg, 51.5 kg. So you will not always have the whole figures. You will have some decimal figures and then you will have some like 55 kg, 75.5 kg. So you won't always have a discrete set of numbers. However, you will have a continuous set of numbers. And this is where our regression is going to come in. This is how we will study about the regression. So now let's see that how broadly the machine learning algorithms and techniques are being classified into. The most broadest category are supervised and unsupervised. And then in between there is semi-supervised and reinforcement. Let us see semi-supervised and reinforcement at a pause right now. And let us discuss about supervised and unsupervised. So in supervised data, what do we have? We have got something like labelled thing. Like you will be given some input and you will also be given some output. And that's how you are going to train your machine that this is how you have to work and you will have to train the details. Now how supervised learning is going to work, I am going to tell you in our coming up slides. However, this unsupervised one has also been divided into clustering, anomaly detection, dimensional reduction and association rule learning. There are four types of algorithm that are described in this technique called unsupervised learning. So let us see how the supervised algorithms work. So just broadly classifying supervised ML and let us see how it is defined over here. So as I mentioned that both input and outputs are available in the supervised machine learning. But let us also go to the data set first of all. So you can see that now the data set is given to realize if the students are passed or shall not pass. So if the students playing are given seven hours of playing for the students and the other one is students study are given, which is one. So you probably know that this kind of a student who is putting seven hours into playing and one hour into studying will definitely not get the desired results and would certainly fail. So that's what I have shown a zero for a fail. So I have classified the data now. I have given a tag okay this is going to pass and this is going to fail and this is how you need to classify the data. So this is about supervised learning. However, if I have to give you an idea of unsupervised learning. So those people, those mates of mine who have been having their, especially I am talking about the IOS systems. So when you click on pictures, right and you click on the pictures of people, basically they're not talking about landscapes and all, but I'm talking about the pictures that you click off yourself and on the people. So you might be seeing that when you, you know, when you have this on the screen, a lot of pictures when you pick them up. So your IOS that is the system that has been working on that is going to give you the pictures and it's going to segment different, different people. Now they don't know who is Nikita, who is, you know, let's say somebody who is Rajesh, let's say somebody who is, you know, touching. So these people, they don't know what the name is, but they know that this is what it looks like. So ML has been working along since very long, right? It has not only working, not only has working start, it has not only started to work right now, but has been working since very long. It's just that we have started to observe it day and night now, right? So in the iPhone systems, when you use the pictures, okay, so in those pictures, you see different, different people and your images have been segmented. Like there are four pictures of yours with your family mates and you will see that your portrait is different. Your friends' portrait is different. Your kids' portraits are different. And if you want to now label that, okay, this is what you are. This is what Nikita is. This is what someone else person is. You know, this is what unsupervised learning is that you have segmented your machine has now segmented different people. And now you have labeled, okay, this is what Nikita is, and now you will identify that, okay, your machine is now capable of identifying wherever in whatever photo was Nikita present, she will be identified. But, okay, this is what Nikita is. Correct. This is how the segmentation is. Let's learn about supervised ML as of now. And then we'll switch on to the unsupervised. So, in the classification thing, we see that seven hours of playing, one hour of studying will lead into a zero percent of passing and you will get a failed thing. Then you input five hours of playing and still you are capable of just devoting two hours to the study time. That is also not going to work and it is going to be giving you a zero thing. So, when you are having students playing hours or two and three other study hours, that means from here we can infer that student will pass. Now, I'm not talking about the other conditions. We do not take care about the other conditions because we are not, you know, taking care of the mind of the student and all such stuff because our independent variables are students playing hours and students study hours. So, we will not take care about the other variables coming over here. And the dependent variable comes in, which is pass or fail. So, we have classified the data now and this is what a classification supervised ML and a variable would do. Correct. So, when we talk about classification, what do we talk about? Classification is a fundamental task in the ML and data analysis that involves grouping or categorizing data into distinct classes or categories based on the characteristics or features of the data. So, the characteristic of the data was that it was leading to failure when zero or one or two hours were devoted into the study. However, on the contrary, four to five or seven hours were devoted to playing. So, it's a supervised learning technique, which means that it requires a labelled data set for training. Always remember that supervised ML techniques will require labelled data sets for training. So, the primary goal of classification is to build a model that can accurately assign new unseen data points to appropriate class or category. So, always remember that you will have to give some training data to your machine and then your machine is going to learn about it and then you will test it. You will visualize its different results and then you are going to pass it on that the beta testing has been done and it is capable of identifying this algorithm is working and hence we can pass it on. Correct. Now, for example, how do I know that this supervised ML is the regression? So, let us take another data set which is giving us the expected salary over here. So, you can see that one is degree, this is first variable, the other variable is experience that I have taken. These two are independent variables and the last variable is the dependent variable. So, we are able to observe that this is a variable in which we are or my salary is dependent on the experience and the degree I hold. Correct. So, the degree is BA, you can see 3 years of experience will lead to 35k salary. BE degree with 5 years of experience is going to 79k salary. Now, this is a random data set, it is not accurate to the market value as of now. But this is a random one which we have created just for the sake of explanation. So, you can see that here you have got different different values like 35k, 79k, 92k. There may be possibilities that the data scientist can get 1,0,2k. You might never know that your salary is going to be 101,02k, 200k. So, these are all continuous numerical data values and this is what regression is all about. So, this is how you will differentiate between a regression and a classification. So, classification is going to have a broadly classified degree where you will have a certain output for it that 0 or 1. If not that, let's say a data set where a person is going to buy a car. But he is going to buy a car on the desired number of days. So, let's say there are 7 days possible at maths. So, it is again a classification sort of work that we are doing on them. So, classification is again just something where you can put this data into some categories. However here it is continuous numerical values and infinite numerical values are allowed. So, this is regression sort of ML algorithm. So, you can just go on with the regression in terms of ML. Yeah, this definition is very much important. So, I think everybody if you can go through with this. So, you will be better understand like what is supervised ML and regression, right? So, regression in terms of ML and stats is modeling technique used to establish a relationship between one or more independent variables. So, you need to realize that there are the independent variables and what are my dependent variables doing in the data set, key data set. Now, what I took was very small data sets. So, you will get larger number of data sets to work on and apply all the algorithm, correct? So, regression is a in terms of ML and stats is a modeling technique used to establish the relationship between one or more independent variables. And those variables are also known as predictors, teachers or input variables. And a dependent variable also known as a target or a response or the output variable. So, this is why supervised is known as labeled technique where you need to have labeled outcomes to train your machine or any mechanisms that you are working with. And the final goal of regression is to predict the value of dependent variable based on the values of independent variable. So, this is done by fitting a mathematical function or model to the data which can be used for making prediction. So, this was a slide into about supervised machine learning. If you have any questions you can put in the comments. Yeah, if like you face any issue in understanding whatever is being explained and if you have any query so you can post your query in the comment section so that Nikita will be able to explain your doubt. So, maybe we can continue while if anybody will be having any doubt so they can ask. Okay, so there are however there are a lot of regression algorithms that we are going to look at but this is just simple linear regression where we have got one input variable and one output variable. Okay, or one independent feature, one dependent feature. It can also be said as one independent variable and a dependent variable. Okay, so we know that there is a there's a common myth, not a myth, I think it is just DMI ratios with the when we concord with the BMI ratios we have got that height is the height is 154 centimetres your weight should approximately be 50-57 kgs around and then you have 157 centimetres of the height and you have 60 kg around of the weight. However, this can also be or this data said that I have shown over here can also be something which we can make clusters out of it. Like if I have to make clusters of people having height in the range of 150 to 150 then I have to make clusters for the people who have the height of 160 to 170 centimetres. Right, so clustering is possible for pretty much everything so even over here clustering is possible. The term clustering that I am talking about is referring to unsupervised machine learning where we are going to go after this and this weight which you are seeing in simple linear regression it is dependent on the height for example as I mentioned already that when you have got 154 centimetres height then your weight should be around about 57 kgs so this is just one independent feature and one dependent feature. However, if there is multiple linear regression what can happen there will be multiple or maybe two input features however one output feature. So you can have two three input features or independent features but you will have mostly one dependent feature or output feature. Correct, so this is about the simple linear regression. So I have one question here so we have height and weight so which we will be considering as independent or which one will be dependent here. On the height we have weight so height independent feature and dependent feature is the weight. Okay so there can be scenario like we can do reverse weight independent and height dependent. Yeah but there will be multiple factors actually compared to other factors also. So for that also like we need to do some research like which can be the independent features and what can be the independent. We have got lot of data sets on the different website like if you will have you can even search for the machine learning data set repository and you will get a set of all the data sets that are available. You can check any and click on any of them if it was possible I would have done it over here but never mind I will make a video and put it over here. However, you can also check the machine learning repository machine learning algorithms and data sets repository present on Google and check lot of data sets. You can opt on any one of them and seek whichever you know if you will see the data set you will now realize and you will be able to identify which one of them is the classification which one of them is the regression. If the outputs are continuous numerical value you will be able to realize that you know this is what a regression is about and if there is some particular set of outputs only then you will see that it is a classification sort of data set. Yeah, so in my opinion like when will be covering all the types so maybe if you can demo on those data sets live like from which website so maybe if you can do some research so I think it will help people understand. So once we cover all the types then if they go and see those data so it will make more sense like people will be able to identify it in AI way like this data set is this type is having this type and this data set. This particular data set is having that particular type. Yeah, so I think we can share with the audience as well in upcoming sessions. Definitely we can do that. Okay, so you want to cover unsupervised today or like. So we have time. Yeah, so we have 1015 minutes. So maybe you can just start introduction relate with this supervised learning and then if we have content left so we can cover that in next session. Sure. So let me sum up what we did in the supervised machine learning technique that we had for classification of ML. Now in the you saw that in the supervised machine learning we have we had two types of data sets that I discussed with you. One of them had zero or one as output, primarily zero or one output or you know pass or fail that was the labor data set and had got pass or fail as output only on the base on the independent features or independent variables that were study time and the playing time. And we got to know what classification would actually do because when I started and when I have got this term, you know, back in 2021, when I was when I started learning, I did not know how to realize what clustering is what classification is what regression is, but then went on to certain level to realize this is what it means and then perhaps what the data sets which explain even better. So this is what I'm trying to do over here to give you the most explanation of whatever I have covered up till now. Again, we moved on to the other, you know, regression was one of the techniques that I was talking about. And I told you that when the outputs are continuous numerical values, can we put them as regression algorithms, the section that we talk about is regression. So we talked about that according to the degree, what was your experience, those were the two independent variables that we talked about. And then we went on to the dependent variable that was salary because that's the salary we've gone. Okay, and that salary is a continuous numerical variable. Hence, we had regression as one of the things imposed. Now we will go on to unsupervised learning where we will discuss about discuss about the third important parameter that is the screen. Okay, so here in unsupervised learning, we do not need any labels. As I mentioned in the iPhone operating system also, your photos and they are segmented. They do not need any label. They will just be segmented on behalf of how the person looks. Okay, and look-alikes will be shared in one part and then you will be having different different sections that, okay, this is what is one lookalike. And now that you will name some person that this is the name of the person, that is how the segmenting of the images will be done. Here also, what are we going to do is you have been given this data. Now look at it very carefully that there is a person who is having age 25, 27, 24 and 28. Then you have got this salary also, which is 25, 50 around, what could be around 168. However, their spending score or spending ratio or what I call as spending rate has been given over here as 5371. Now let's say I am a person who is starting a business. Okay, and I want to be somebody who want to change maximum audience. But of course, not the audience, which is not willing to buy. If their spending rate is 0 or 1, what am I going to do about that? I want the people whom I can be most benefited by and they can be benefited by my product. Okay, so this is what we look at when we talk about business. Now here we can see that the spending rate is 5 for the person who is having 25k salary. As you can see that this person is having 25k in hand and his spending rate is mostly above 1 and 3. And then you have got a person with 50k who is spending lesser than what is being spent by this 25k person. Correct, and then you have got 40k person who is spending habit is still way ahead. So if I were to give a discount, if I am the person who is owning a business and I were to give a discount, I am certainly going to offer 3. So I am going to offer the person who is first one and the third one. Right, that is why Zomato, Swiggy, Arvio, Mantra, all these sort of businesses that we are talking about, they are going to target those people who will have most orders. So that's why sometimes when we have left over things in our cart, the notification from Arvio comes in that you have got something in your cart which you have to buy. Right, so this is the sort of things or notifications that we now get. So you are seeing the importance of the ML engineer and the AI engineers who are working in those domains and people who are having the knowledge of data science are actually going to make the business even more expanded than it was if you do not know the DAS part. So as I mentioned, if anybody who is getting more of the notifications preferably from Zomato and all, they are actually the ones who are the greatest buyers of the product. So if your spending rate is 5 or 7, you are being targeted and you will be hearing different discounts. So you will be seeing that some application, in some of your friend's application, they are having 20% discount, however you are not having. That means their spending score has been quite greater and they are most often buyers of the product that these companies offer. Okay, so this is how your clustering would work. So this clustering has got its implications as it is going to cluster this 5 and 7 and it's going to differentiate. Now one of the clusters is for the people who have bought the most, one of the clusters is different people who have bought the least. Okay, so I am going to target that cluster which belongs to people who have bought more from the website whose spending rate is higher than the ones that I said clustered in the different section. So this is how the unsupervised level works. It clusters the people who have got greater spending habit and those who have got least spending habit. My concentration would be on those who have got more spending habit because that's the audience that I am looking for and who are going to be benefit with my business. So and this is just about four businesses that I have talked about. Now in this world we have got so many businesses and you can see wherever you go, there is importance that you understand what this lecture is all about. Correct. So these variables are not depending on each other. They are different variables. However, clustering is based on the spending rate over here. Just one of the variables that I am talking about is spending rate and this is what is going to make my business expand even more. That is the tactical application or use case of unsupervised MN and the part is clustering. So by definition we will talk about it now. Clustering is a type of unsupervised machine learning technique that involves grouping similar data points into clusters of categories based on their inherent characteristics or patterns. Unlike in the supervised learning where data is labeled with a predefined category within clustering, the algorithm discovers natural groupings within the data without any prior information about these groups. So the machine is going to look for a certain tendency that what is the common tendency. And then they are going to cluster these common tendencies and then we will see that this is what the group that is what we are wanting to actually target and look for. So the primary goal of clustering is to find structure or patterns in data, making it a useful technique for data exploration, pattern recognition and data segmentation. Image segmentation I already discussed but there is a whole lot to discuss about image segmentation that we will take over in the next other lectures. Right. So you covered supervised ML and unsupervised ML. So one quick question from my side like how we will be deciding like which one to choose. So if you can relate with any use case. Yeah, as I mentioned that you have to have data sets for it. And then the output or label has not been given your output variable is not there. So we even look for the data as I mentioned that over here you have got that spending rate in one of the domains that I will make my algorithm work on. Right. This is the spending rate that I look for. So now I am the one who is wanting to sell the business. So this is particular type of variable that I look for. If I am in I went to a different business and I look for a different sort of a data set and different sort of the audiences and their different kinds of nature. This is just one nature that you've talked about. However, the other nature is definitely they have. And like, yeah, I mean, this is what it is about. Okay. So I think this is it for today's session. And I think in next session we will be covering some other things. So if you can go ahead and showcase like what you will be covering in the next session. So it will give some idea. So we are going to discuss about the algorithms now in whatever we had discussed about this regression and classification. Now we are going to get into these algorithms and we are going to take different different case nets and data sets to differentiate and as I mentioned key regression and classification. How would you differentiate into these two? So we will talk about the data sets and then we will proceed with these algorithms over here, regression algorithms and classification algorithms which are named as it is. They will incorporate some sort of mathematics also. We will be here in the coming up next year mostly. But what you should know for these is just your 10, your 11 mathematics statistics a little bit of it that would work. Okay. Just basics. So I think this is all related to supervised learning and same sort of algorithm will be there for unsupervised learning as well. Yes. One thing guys, I just want to tell you like this bootcamp is like covering all the aspects of AI. It is not limited to Salesforce AI associate certification. So that is part of this bootcamp. If you follow all the sessions, you will be able to do that. But this will be covering all the aspects of AI. So if you follow all the sessions, you will be able to learn different, different algorithms session by session. And all these algorithms, Nikita will be explaining you with the help of both. So she will be explaining like how basically that algorithm works. Right. So do follow all the sessions if you want to upskill yourself in the AI era. So anything else that you want to add before we wrap the session? No, that's it. I have explained all about it. Okay. Okay guys, so thank you for joining today's session. If you, if you're watching the recording, thanks to you as well, because I know like most professionals are also watching these sessions. So if you are in office and you're watching it and at any other time, so please take out some time and follow all the sessions so that you can upskill yourself. And thanks to Nikita for sharing her knowledge. Okay. I want to like reach out and want to ask different questions after the session. So feel free to post your questions in the comment section. Right. And we'll try to pick up those questions in the upcoming sessions. Okay. Thank you so much. See you in the next session. Bye everyone.