 Have you ever wondered just how to calculate that beta sub 0 and beta sub 1, the intercept and the slope in simple linear regression? In this video I'm going to show you just how easy that derivation is. Here we have a typical dataset where for every x value we have a y value. We need to predict a y value that is our outcome for every x value which is our feature. In order to do this we can create a linear function. At which function will best represent the data point values? Is it this one? Or this one? Or this one? Perhaps this one is best. Note that it is a simple linear function in the form y equals the slope times x plus the y intercept. But our data point values do not lie on this line. For every x value the y value is some distance away from the predicted y value at that point. These distances are called errors. To find the real y value we have to add the error value epsilon which may be positive or negative above or below the line. Through simple algebra we can create the function for the error. This equation shows the function for every error and to stick to statistical notation we replace the y intercept with beta sub 0 and the slope with beta sub 1. Once some errors are positive and some are negative we simply square each of them, makes them positive. Now the sum of all the squared errors is written like this. Note that it is a multivariable function in two variables beta sub 0 and beta sub 1. The x sub i and y sub i values are constants, our data points. We need to minimize the error. Remember that as a two variable function it is simply a shape in 3D space. It will have a minimum somewhere on the z axis. This is easily done by taking the partial derivative of beta sub 0 and beta sub 1 and setting these derivatives to 0. Let's take a look at how to take these partial derivatives. So now we have this equation for the sum of squared errors. Let's just have a look at it. I said we'll call it s and remember that this is a function of two variables beta sub 1 and beta sub 2 and that is purely the sum of yi minus beta 0 minus beta 1x sub i all squared. So I've just distributed the negative sign in there. And it is a three-dimensional shape. Some way it is going to have a minimum and that is what we want. We want the minimum, what line can we draw. We want these two coefficients so that we can minimize those little errors because that will give us the best line. So this is a multivariable function. We're going to take the gradient, we're going to do the partial derivative and we're going to take the partial derivative first with respect to beta 0. And that would just be the same as having a function in x and y and we can take the partial derivative with respect to x and respect y. It is confusing of course that we have x and y here so we're just not used to these terms. That is our two variables yi and y sub i and x sub i are just constants. They are two values in your data column for every x there's a y, for every x there's a y, for every x there's a y. So those are actually constant values. So these are constants and those are my coefficients. We're just not used to seeing like that. But how do we take the partial derivative of something inside of a sum? Well, let me show you how easy that is. At least for me the easiest way is just to expand the sum. So what are we saying by this summation sum? Remember this is going to go from i equals 1 to n. How many of these data points or pairs of data points do I have? But let's just expand this. It's quite easy to expand. The first one is just going to be, or let's just write s. This is going to be y1, the first one, minus b0, minus b sub 1, x1, all squared, plus, the next one is going to be, sorry, y2, minus b sub 0, minus b sub 1, x sub 2, squared, plus, and it just carries on. Next one will be 3, next one will be 4, next one will be 5. So that's just a simple polynomial and we know how to use the chain rule to get that derivative. So let's get, let's do this, let's take the partial derivative of s with respect to beta sub 0. So I'm going to have that ds, d beta sub 0, it is going to equal, well I can differentiate this. This is easy to do. That is just going to be 2y1, minus b sub 0, minus b sub 1, x1. So to write that, now I've got to do the derivative of the inside, that's a constant. That's the derivative with which respect to which we want to take. So that's just going to be negative 1. And we're doing the differentiation with respect to beta sub 0, not beta sub 1, that's a constant. So nothing happens there. Plus let's do this one. Again I'm going to bring a 2 in front, it's going to be y sub 2, minus beta sub 0, minus beta sub 1, x2. And I've just got to do that derivative, which is negative 1 again, plus and we carry on. And what we want to do is we want to set this equal to 0. Remember when we wanted to get a minimum or a maximum, we take the derivative, wherever our derivative, our slope is 0, that is where we get a minimum or maximum. So we're just going to set that equal to 0. So if we set this equal to 0, we can take the 2 out. We can take the negative 1's out as common factors, so there's a negative 2 here and I'm going to divide it by this side, but it's 0 divided by negative 2, so I have nothing left there. So all I have left is basically this, y sub 1, minus beta sub 0, minus beta sub 1, x1. So I'm going to have plus y sub 2, minus beta sub 0, minus beta sub 1, x sub 2, plus, et cetera, and let's go to equal 0. And all I do now is I see I have y sub 1, plus y sub 2, plus y sub 3, plus y sub 4, well that's just the sum of all the y sub i's. If I write that out, it's going to be y sub 1, plus y sub 2, plus y sub 3, and I see I have negative b sub 0, negative b sub 0, negative b sub 0. It's going to be n, remember there, it's going to be n of n, so that's minus n b sub 0. And what do we have left here? I can take this negative b sub 1 out, negative b sub 1 out, and I'm just left with then x sub 1, plus x sub 2, plus x sub 3, that is just the sum of the x sub i's and that equals 0. And all I have to do now is solve for this. If I take that over to the other side, that's going to be n b sub 0, and that's going to be the sum of y sub i, minus beta sub 1, sum of x sub i's, and I'm left with the fact that beta sub 0 equals the sum of y sub i, of y sub i, minus beta 1, the sum of x sub i's divided by n. So I have an equation now for beta sub 0, it unfortunately has beta sub 1 in it, but what I can do is take the partial derivative with respect to beta sub 1, and I'm going to get another equation which also contains both of these, and I can take my beta sub 0, and I can put that in place of the beta sub 1. I'm going to clean the board and I'll show you how to do the partial derivative with respect to beta sub 1. So we've cleaned the board and I've just put the equation for beta sub 0 on this side. Now let's take the partial derivative with respect to beta sub 1. Let's just do an expansion here, so I still have my function s, which is a function of beta sub 1, beta sub 0 and beta sub 1, and let's just expand this. So I'm going to have beta sub 1, minus beta sub 0, minus beta sub 1, x sub 1, and that is squared, plus I'm going to have y sub 2, minus beta sub 0, minus beta sub 1, x sub 2, and that is squared plus, and it carries on n times. Very simple to take the partial derivative using the chain rule. Remembering this is a constant, that's a constant, and that is a constant. Let's just use the chain rule, so I'm going to be left with 2, times y sub 1, minus beta sub 0, minus beta sub 1, x sub 1, and I've just got to take the derivative of the inside as well. That's a constant, that's a constant, that is a constant. So I'm just going to be left with negative x sub 1, plus I'm going to take a 2 out, y sub 2, minus beta sub 0, minus beta sub 1, x sub 2, and I'm negative x sub 2 out there, plus that carries on, and I'm going to set that equal to, simply set that equal to 0. So very easy to do, very easy chain rule. Now let's just take some constants out as common factors. I can certainly take a 2 out every time. I can take this negative 1 out, but I can't take the x sub 1 out because x sub 1, x sub 2, those are totally different constants. So I can't take those out, but I certainly can take the negative 2 out, divide both sides by negative 2, 0 divided by negative 2, that's still going to give me a 0. And I'm just going to put this, distribute that in here. So that's x sub 1, y sub 1, minus b sub 0, x sub 1, minus beta sub 1, x sub 1, x sub 1, is x sub 1 squared. Plus, what do I have here? I have x sub 2, sorry, y sub 2, multiplying that throughout, minus beta sub 0, x sub 2, minus beta sub 1, x sub 2 squared, plus goes on for n times, and that equals 0. All we have to do is simplify using summation notation. I see I have an x sub 1, y sub 1, plus x sub 2, y sub 2, all the way to x sub n, n sub n, they all add it to each other. So I can use summation notation, x sub i, y sub i, very easy to do there. I see I have negative beta sub 0, negative beta sub 0, each time times a different constant there, so I can take negative beta 0 out as a common factor, and I'm just left with the x sub i's there as a summation. If I were to expand this, x sub 1, plus x sub 2, x sub, x sub 3, until x sub n, inside of parentheses, multiplied out by negative b sub 0, I get exactly what I get all along, and I'm going to do the exact same thing with beta sub 1. I'm going to take beta sub 1 out, and I'm left with a summation of x sub i squared, and it's each of those squared, squared there, and that should equal, and that should equal 0. Very simple notation that I have now, here with my summation notation, I have an equation with a beta sub 1 and a beta sub 0, but I know what beta sub 0 is, which I can just place inside of here with substitution. So let's have the sum of x sub i and y sub i, and negative, let's put this in here. That is the sum of the y sub i's minus beta sub 1 times the sum of x sub i's divided by n. I still have this sum of x sub i's that I have there, minus beta sub 1, and I have this sum of x sub i's squares, and that equals 0. We can multiply throughout by n over n, just to get this common denominator. So there can be an n, and there can be an n, and there can be a little n there, and there can be an n. And on that side as well, I can take 1 over n out as a common factor, multiply both sides by n, n times 0, still stays 0. So what do I have now? So in essence, I've just gotten rid of my common denominator there. So I have n times the sum of x sub i, y sub i. Now I've got to be very careful what I do here. I've got this negative here and that, so don't make a mistake. So that is going to be negative the sum of x sub i, sum of y sub i. So I've multiplied that in there. I have negative times a negative, which is a positive. No problem there. Negative times a negative is a positive. I have my beta sub 1 there, and now I have this x sub i squared. I have two of them. So the whole thing squared, which is different from that. Because now here I'm left with negative n times beta sub 1, and I have the sum of x sub i squared, and that equals 0. All I have to do now is just get my beta sub 1s on the one side. Let's do that. So I'm left with n times x sub i, y sub i, the sum of that, minus. I have the sum of x sub i, sum of y sub i. So those are, remember, two different things. And I take these over to the other side. So that is going to leave me with n. That is going to leave me with n beta sub 1. I have the sum of x sub i squared. This over to the other side minus beta sub 1, and I have this summation, which is squared. Now let's just swap left and right hand out with each other. So on this side, I'm just going to take beta sub 1 out as a common denominator, as a common factor. I should say, if I have that as a common factor, what do I have left here? I have n times x sub i squared minus, I have this summation x sub i squared, and that equals this left hand side, which is n times the sum of x sub i, y sub i, minus the sum of x sub i, sum of y sub i. And all I need to do is divide both sides, divide both sides by this value here. So I'm left with beta sub 1, beta sub 1, and equals, that is going to be n times the sum of x sub i, y sub i, minus the sum of x sub i, sum of y sub i, and I divide that all by this, which is n times the sum of x sub i squared minus this summation squared. And there you have it, quite easy to do. Once you remember that you can just expand the summation. It's normal polynomials, they just remember what are constants and what are variables. You're taking a partial derivative, so most things are just use your chain rule and just bring it back to summation notation. Substitute what you know for beta sub 0, because we've already done that, that's easy. And just don't make a mistake with multiplying out the negatives. Keep things like these two separate, the sum of x sub i, y sub i is different from that, and keep these two quite different as well. And I brought those two over to that side. If I kept them on this side and took these to the other side, I would have had to multiply out by negative one over negative one on the other side. Not an issue at all. I brought it over to that side to swap left-hand side and right-hand side, no issues. And I'm left with those beautiful two equations for the y-intercept and for the slope of my linear regression line for a single variable.