 Hello, everyone. I'm Sanjay Gupta. I welcome you back on Sanjay Gupta Tech School. So as you all know, we are having AI bootcamp. So for the same, I have Nikita with me. Welcome Nikita on the channel. Thank you, sir. So today we are having day two of this AI bootcamp and one new topic you will be going to understand. So let's jump onto the next slide. So I just want to tell you, like, who is your instructor? So if you want to know more about Nikita, so this slide is good for that. So just go through with that. And she's having vast experience in the field of data science AI and ML. So she has done lots of research in that. And now on this platform, she's sharing those experiences with you so that you can also become expert in the field of this AI and data science, right? So with this note, I hand over it to Nikita so she can start today's topic. So over to you. Thank you, sir. Hi, everyone. So we'll first catch up with everything that we have discussed up to now in a very nutshell. And then we will propagate on the first sort of model that is we that's what we have got for you. That's regression model. And in that also, we'll go on to the first one, which is a simple regression model. We do the single regression model, whatever you may call it. So let's dive into the session and see how does it go. So we discussed about regression classification and clustering, three types of scenarios that we had for supervised and unsupervised learning. In this, we also discussed about what regression is used for, what regression models are used for, how you can train a machine to predict the continuous numerical values. And that's what we said that, you know, this is this regression is used for predicting the continuous numerical values, right? Because decimal values will also be allocated. That's the reason why we will use the regression model for the continuous numerical values. Then we move on to the classification. For classification, I told you an example of pass and fail, or you can also take any example that say, you know, I took the example of number of days that we have. So if I want to buy a car with sort of day which day will I use to buy the car. So as many examples that you can create off, you can classify into those many categories. And this is what this classification stands for, that you can choose any category and predict that what sort of category is my output going to be. But both of these regression and classification are for supervised learning. The next type of learning was clustering. Now in this one, we were predicting that what type of a person will the customer be. And if I'm going to target those type of customers or not, this is what we are looking at, right? So you can have this complete nutshell of regression classification and clustering. Let's just have a quick read of it. So regression belongs to supervised learning output is a continuous quantity. And the main aim is to forecast or predict. Of course, this is what we are here for that we are going to make a machine capable enough of predicting the values. And an example that they have taken is predict stock market price, right? So algorithm that we look at in this sort of research is linear regression. Then we move on to classification. It is also supervised sort of learning. And the output is a categorical quantity. And its main aim is to compute the category of data. For example, if you have to classify any email as a spam or not spam email, so we will let it know that these type of mails, you have to categorize in the spans and these types of mails you have to categorize in those spans. That's why automatically these days we are having a spam folder automatically the, you know, the texts of the mails are analyzed and then they are put into the spam folder, right? So this is what classification algorithms would do. So the algorithm that you would use for it is logistic regression that we will later study on. The third type of thing that we studied was clustering. It is unsupervised learning and it assigns data points to the clusters. So we use scatter plots for such things. I am going to tell you what scatter plots are. So scatter plots we will discuss in a glance because they have got huge implement implementation in our data science stuff. So I'm going to explain that to you later on. But right now you can just understand that we make clusters depending upon which customers are useful for us, which customers are going to give us the deals and it is beneficial for us to target them and give them the super 20% off, 30% off so that we can have the maximum benefit. So the main aim is to group similar items or clusters and then, you know, there's an example saying that find all transactions which are fraudulent in nature. So if you want to, you know, couple up all the transactions that are not so vigilant, not so good, that's what this term fraud is term for. So, you know, find all transactions which are fraudulent in nature and the algorithms that we are going to use are K means I know that I have not introduced you to this K means sort of algorithm. But later on, after we are done with the supervised learning regression models, classification models, we are going to go to clustering models as well. So I think that this should be pretty much clear to you as a nutshell or as a basic of it, right? Of course, when you will dive into the mathematics core mathematics of these algorithms, you will understand even better. So yeah, I think with this slide, it is very much clear. And I hope like in future, whenever people or folks will be going for an interview, and this kind of question will be there. So I think with this slide, they will be able to answer it properly because I can see the definitions in summary form, examples and related algorithm as well. So yeah. Okay, so we can jump on to the next topic. That is regression. So regression is the model that I told you that it is used for predicting the predicting the continuous numerical values. So it's very important that you focus on this term called prediction. So prediction of the output values based on the single independent variable that will be used in the single linear regression. But for example, if you are looking at multiple linear regression, then you will have multiple independent variables. And the output will be single, mostly single. But we will see at the cases where you know, this doesn't sound which is like very less number of cases. But yes, mostly, we have single output and multiple features that are the independent variables. For normal regression, we look at it later. But as of now, we just discussed about simple linear regression, let's keep it step by step so that you are very much aware of what you have done in the session so far. Right. So first thing first to be to understand this simple linear regression, you should first understand what is independent variable or what is an independent feature and what is the output. So for example, let's say if we have got a certain input variable that is X and what is this input variable and saying that let's say we have got experience experience as a input variable. So we will predict the package. Right. So we are going to predict the package. Now, right now is what I am predicting the package, but later on this will be fed to the machine. It will automatically take up this data and then we will have some testing data on which we are going to, you know, give the machine and test it on. And then finally, when the error value comes in, the error value should be nullified or it should be very less, the output will be very optimal. Right. So suppose the four years of experience it leads to the package of let's say 7 to 8 lakh or maybe 7.8 lakh. Now this is regression model we can have the value 7.8. Then you have five years of experience leading to the package of let's say 8 lakh. Right. Then you have let's say six years of experience and you have got 8.9 lakh. So this is how the simple linear algorithm works or simple linear regression works. But when you fed these type of data into a model or into a machine, what exactly goes behind in the, you know, when you use it in the Python or any other programming language, actually there are two programming languages that we use for, you know, implementation of this models. So one is Python and one is our programming. Mostly we are going to look at Python only over here because that is more vast and has got easier implementations. So we might have some of the Python sessions also where I am going to use these sort of, you know, algorithms to show you what outputs do we have. Right. So right now you can just take a look at this that in the simple linear regression, what we have a single independent variable and an output variable, mostly why we take it on the y-axis and it is plotted like this on the x-axis y-axis. These are the x and the y-axis. Here I can plot the experience and based on the experience we will plot the package. Right. So we have got this way of output. This is the output and now we will look for the line of best fit which will be somewhere here. This is what we are going to look at today that how mathematically we lead to the line of best fit. Okay. So I told you about the simple linear regression. If I have to talk about multiple linear regression, we will talk a little bit about it. So I will just let you know that in multiple linear regression we might have let's say a thing called CGPA then we might have age factor. We might also have experience. We might also have the IQ of a person because that is also an input feature. So all these are the input features and finally we come to the output feature. So you can look at it that there are multiple input features in the multiple linear regression. Correct. So CGPA can be 8, 9. Of course in India it works till 10 and in other countries there is like somewhere it is 5. So it depends on country to country wherever you are. Then you have got age of a person is 24, 25, let's say 26, experience maybe of 2 years, 3 years, 4 years, IQ maybe of any unit whatever you know you think that is appropriate unit for IQ and then you have got the output as a package. So there are several inputs that we are looking at when we talk about multiple linear regression and only one output that is package which is based on the multiple linear regression. Right. So this is about the two most vital things of DS. Then you have got simple linear regression as I told you. Yeah. So I was talking about the simple linear regression. Here you can see that I have plotted two independent plots for the regression. So first plot is you can see that it is showing the line of regression which is what we are going to predict and according to this line of regression we have the final outputs. Our machines are now capable of giving you the desired values. Here there are some predicted values and then at the line of regression the values that we desire. The values that are on the top of it are lying above the line of regression and the line values that are on the in the bottom are lying in the bottom of the line of regression. Now I'll tell you how these come up but before that you can also look in the right hand side graph. We are having the distances. We are going to have the distances over here. The distances between the values of the predicted and the line of regression. So these lines that you are able to see these are the errors. Right. And we are over here. The machines are over here to minimize those errors. And today we will formulate a function that is going to give you the total idea about how this works. Correct. So the blue lines that you can see these are the errors. Okay. Some values are on the top some are in the bottom and we need all the values on the same line. Okay. So this is about linear regression. Correct. So let's work on. Text is very much visible. Yes. It is. Right. So to understand the linear regression we should first know what is a line. Okay. I'll plot some values for you. One, two, three, four. And then you have one, two, three, four. Your two-dimensional plane has been, you know, it has been intersected and you have got four quadrants. We go first, second, third and fourth. Right. If I plot a line from year to year we can see that I have plotted a line where I am getting the y value and x value the same. So my value of x is one. At the same time the y is also one. When the x and y value are zero. So when x is equal to zero we have got y is equal to zero. This is how we have the table for it. Right. So add x is equal to zero. So we are having y equal to zero. At x is equal to one we are having y also one. At x is equals to two we are having y as two. At x is equal to three we are having y as three. So you can look at this line which is being plotted given the data. Now why am I telling you this? It's because we need to know what a line is. Here's the equation of the line I can just write down y is equal to x. Correct. There is no difference. We can just say that y is equal to x is a line. However, if I have started from one, two, three, four, five, six then I have one, two, three, four, five, six. So if on the value of two I am going to get the value of y as four. Let's say at zero we are still having zero. At four we are having the y value as so what I can from this sort of line take out is that y is equal to two x. Right. This line is not appropriate but if you will plot it on any tool you will get the exact line for y is equal to two x and the values that I have chosen are at x is equal to zero we get y is equal to zero at x is equal to one we get y is equals to two. Two that also I just plotted. At x is equals to two I will get y is equal to four and at x is equals to let's say I have taken four I have y as eight. So any value you can take it was for me I took some values you can take any value given you have got the linear equation as you may even call it linear relation or equation up to you. So these are the lines that we have right. You should know what lines are because we are going to use y is equal to mx plus c as the equation of the line. So first you should understand that this is what y is equal to mx looks like it hasn't got any intercept as of now at because it is cutting the the origin so it doesn't have any intercept but when I have the practical implementation of it let's say I am plotting the salary of a pressure on the graph. So I just can't say that y would be equal to mx why because if the experience I am saying that x holds the experience and y holds the salary. Okay so can you say for a pressure the zero experience calorie is equal to zero can we say so. We certainly cannot say that the salary of a pressure will be zero right because we need to have at least some salary for the pressure. So that's why we need intercept and y is equal to mx plus d comes into play right this is what the value d is. So maybe three lines let's say that the initial value of initial value of calorie zero experience d lags right so we can say that d is equal to d lags over here given x is equal to zero. Okay so in y is equal to mx plus c or mx plus d whichever you like this is called y intercept as I have explained what intercept does then you have got m as slope of the line this is how it is referred and y is the output that you get. This was the extract of linear algebra into our artificial intelligence machine learning topic and this is just a bit of it because we are going to go ahead with a little bit more if you have any questions you can please put. Yeah so guys if you have any question you can just post your question in the chat so that I can pick those and like ai is basically algorithm and they are like these algorithms you can understand with the help of mathematics that that is being explained right now okay so if you are able to understand that is good if you are not then just go back and watch it two three times so that you can relate what is being explained right so initially it will be time consuming but when you will be able to understand one or two algorithm then you will get up to the speed and you will be able to understand all the algorithms properly so I don't see any question as of now so we can go ahead and cover a few more things so we are going to cover this linear regression right single one. Okay all right so up till now you have seen what a line is how the linear equation generates now the slope is yet another thing that you need to understand what exactly it is I had posted a picture of the stuff that we are discussing so I think I have it over here yeah I think with this also if you can explain it could be good yes yes yes yes so this is the equation of the regression line as you can see where b not or you can say beta not it's the slope of this line that where exactly this line stands which is the best position of the line and according to this what errors are to be sorted out because you know the outputs are here and we are going to actually eliminate these differences these are the errors that we have so this is what you're going to eliminate correct so now what is what is the slope this over here what is this slope so we have the x axis over here as you can see so the slope of a line tells you exactly that what changes in the x would give you the desired changes in the y so if there is a unit change in x what will be the change in y if there is a two unit change in x what will be the change in y so this is what we are going to calculate over here so what experience suppose we have the input variable as the experience correct so what changes the experience would lead in a higher package what experience would lead in the lower package this is what we are going to look at so ultimately when you will run this algorithm in the python so you will have to input the cases the training data set the testing data set and then finally you will be having your final output which will look like the one that I had shown you on the screen already correct so this is a very very important slide as I already wrote y is equal to mx is the equation of the line and here we have got the practical implementation of that in the regression model when y is equal to b0 plus d1x it is the equation of the regression line where d1 is the slope that will give you the change in the x the change in the y with respect to x what change in the x happened so my y got changed and then b0 is the intercept when x is equal to 0 that is when you you can correlate it with the if you have got experience 0 so you should have at least some salary in hand right you cannot have zero salary after one year of work isn't it because the freshers without experience are they you know expected to work at zero salary of course not so this here this is the point of the y intercept which is where the regression line cuts the y axis so this is the gist of the simple linear regression correct so now we might have to formulate another formula for it that is going to give you the error but I will suggest you to go through the go through these three points which are the main purposes of the linear regression so simple linear regression is used for three main purposes first is to describe the linear dependence of one variable on the other the linear dependence of this y variable on the x variable correct then to predict values of one variable from values of another variable the values of experience were given and from this experience we are going to predict the package of that person this is what is done in this algorithm then for the third one is to correct the linear dependence of one variable on another in order to clarify other features of its variability this you will understand after you formulate the final error function correct how much time do we have yeah maybe we still have time right yeah five minutes five more minutes yeah yeah you need more time no that that's okay yeah I think we can complete the simple linear regression in today's session yeah so I was thinking if we could just finish the formula also but since we just have got five minutes so we might calculate the formula in the next class but yes I hope you've got the basic gist of what regression would do it will it is going to predict the values of the example that I've taken is a salary of a person given the you know experience salary or package whatever you call it so this is how the data is plotted the data points as you can see we have got the the case net with us we plotted the data point and this is how the output regression line of regression we have got similarly in the second one we have got the tv as an input and the sales of the tv were the output so how many tv's were sold that is given on the x-axis and based on it what is the output or what is the sales that is what is calculated so our model is expected to predict that you know if I sell 100 tv's this is the sales that we should have if I sell 150 so what are we going to have the sales of 20 around okay so our model is capable enough now to predict that what salary it should give you as the output correct this is the simplest algorithm simplest I should say the mathematical model that we have that we are discussing today that is a simple linear regression after this the level is only going to rise in terms of mathematics and implementation of algorithms so let us have a quick read of how simply I can explain you the simple linear regression is a way to figure out how one thing let's call it x can predict or explain the other thing which is called y they are taking x and y because on the axis we have got one x-axis and one y-axis but you could you could have taken other variables as well that's not that's not a problem imagine you're trying to understand how number of hours people study which I took this example in the previous class where I told you about if a person is studying for the xr relates to their test scores the single linear regression helps you find a straight line that is called the line of best fit that shows the relationship between these two things so in the simplest terms it answers questions like if someone studies for xr's how well are they going to perform in the test so this is what you have created you are going to plug in that I have worked for five hours will I pass or are there any other chances similarly if you're going to put the data of the class that you have taken suppose you have taken for 60 students so for 60 students you're going to plot it you're going to just get the data that okay these many are the hours for which different different people have studied for so the output will be generated for the student who are going to pass and who are going to fail this is going to be done by the model that you've created so the single linear regression is like drawing a line through data points to understand how one thing is going to connect to the other it's a simple way to make predictions to see or see patterns in the real world so we have already discussed it I have just put it in the nutshell nothing beyond it has been delivered so y'all can just go through these four points after we are done with the lecture and you know if you can revise it it will be great because if you have to follow up for the next sessions you certainly need the vision for this one right in the last we would just say that in regression the order of variables is very very important you can't replace x with y and y with x your input variable whatever you call it you have to use it in that way only and mostly everywhere it is in the form of y is equal to mx the c so it's better that you use it that way only input variable is the horizontal line and the output variable is generally delivered by the vertical lines only so the explanatory variable always belongs on the x axis and the response variable always belongs on the y axis this is your two dimensional linear regression model okay so I think formula is still pending so maybe in the next session we can have a quick recap and then you can explain like how we can build the formula which is left right okay guys so just go through with the video once again if you have any doubt and try to understand so this way like we'll be having explanation of each and every algorithm and you will have the mathematical calculations as well and maybe in some of the videos some of the sessions we'll try to have some practical implementation with the help of python so we are planning that as well so that like how you can understand those python implementations so we are working on that so soon we'll be deciding like how we can show you the practical implementation so I hope like you enjoyed the session and you have gained some knowledge in in the direction of AI okay so thank you so much Nikita for sharing your knowledge on this topic which is linear regression so I think you just covered the one part of it right it is having two more branches right okay yeah so guys we'll be having one more session this week so I will share the link soon with you so it will be most probably on Thursday and like Nikita will be covering the remaining part of this simple linear regression and we'll be picking another algorithm so that you can understand that as well okay thank you so much for watching this session thank you Nikita for sparing some time thank you so thank you