 So in this video, we're going to look at ordinary least squares in that we're going to solve for a quadratic, a cubic, and a linear equation in the plane. So what I'm going to assume here, as you can see in the introduction, I'm going to assume that you are familiar with vector operations, adding a vector, subtracting vectors, multiplying a vector by a constant, et cetera. Matrix operations, including the transfers of a matrix, the inverse of a matrix, and then matrix vector multiplication, very specifically the column space of a matrix, the concept of linear independence of those column vectors inside of a matrix, and then the projection of a vector onto the column space of a matrix. At least if you're familiar with these things, I'm going to explain some of them in an intuitive sense so that you understand the use of least squares when we're trying to approximate a solution. So the packages that we're going to use in this Jupyter notebook or then Google Colab notebook is I'm going to import warnings and then just set the filter warnings to ignore. So you're going to set the percent config magic there just because this is a retina display and I just want those images to be nice and crisp. And then from a matplotlib, we're going to import the pipelot module as plt and then I'm just going to set a font because I'm going to use a specific theme when I do my plotting. Then I'm also going to import numpy using the namespace abbreviation np as usual and then from the symbolic Python package, sumpy, we're going to import the matrix function that's an uppercase m. So here's our example problem. Let's set up this problem. We have a two space, so Cartesian plane, x and y coordinates, and we have these three points, 1.5 comma 1, 2.5 and negative one comma negative two. And as you can see here, we have this picture here of the problem and that's the theme I'm using, xkcd, very famous. And you know, it's just a bit of whimsy as far as our plotting is concerned. And what we want first of all, we're going to find this quadratic approximation to the points. So remember quadratic function, that's a second-order polynomial to go through these points. Polynomials such as y equals x squared, for instance, that it goes through those three points. Now it's also got to go through the origin there. We're not going to set any kind of y-intercept and that's going to be very difficult. There's not going to be, we'll see that there is no quadratic polynomial that is going to go through those points. So let's start then with this quadratic approximation of the solution. And I show you here y equals a sub one x plus a sub two x squared. That would be a polynomial that goes through the origin. In other words, there's no y-intercept. We're not adding some constant to that. But I have these two coefficients, a sub one and a sub two. So for instance three equals negative four x plus two x squared. Now that is a polynomial and it might go through those points. We need to solve for those coefficients. And if you're using this in some kind of application, for instance linear regression, we usually use the coefficients beta and we usually use a beta hat just to denote that this is going to be an approximation. For instance, we have a sample taken from a population and we're just estimating those population parameters by the sample values that we're going to calculate. So we use beta with a hat on and for a sub one that specific coefficient, we can use beta sub zero and for a sub two, we can use beta sub one. So in the end, we're going to have this equation. We've got to solve for those beta values. So y is going to be approximately equal to a beta sub zero times x plus beta sub one x squared. So if we look at the three points then equation two, remember the first point was 1.5 comma one. So 1.5 was my x value. So we can see my first coefficient times 1.5 plus my second coefficient times 1.5 squared. And that should give me one. And the second point was 2.5 and the third point was negative one comma two. So you see the y approximately equal to, and if I have these values beta sub zero and beta sub one, it's going to approximately equal those three y values given the three x values. So if we write this in vector notation and then using a matrix, so we put all the y values in a vector and we're saying that vector is approximately equal to, now this is a constant times a matrix, a constant times a column vector I should say, beta sub zero times my vector of x values plus beta sub one times each value squared. So if I write this with a matrix, I'm going to say y is approximately equal to this vector and this column vector all in a matrix. So that's what we have in line three. This x here representing a column vector and x squared my second column vector times beta a hat. And that's a vector with a two values. And you can see when I multiply this matrix by this vector, I'm going to approximately get this one five two. And that's what we write very neatly right at the bottom. So why my vector y is going to be approximately equal to some matrix times my vector of unknowns. And what I need to solve for is those vector of unknowns. That is what I really have to solve for. And that's going to be a slight problem because we see that approximation and we'll see shortly why this is just an approximation. So just want you just to realize one thing, we're going from our points, which is in two space. We're going to three space here. Those two things don't have anything to do with each other. Those points that we had was in two space, but I'm taking all the x values and y values, setting up a quadratic. And now we are in three space because we had the three points. So this is what we're trying to solve for. And I'm going to show you just over here on this tab, let's go there, it's just a little Google drawing of what's going on with this least squares. So if you think about our two, now this is not a representation of our actual problem. This is just some schematic of our problem. So our column vectors, we have two column vectors in our matrix A. And imagine these are these two blue vectors there. Now they're in three space. We can see the x coordinate, the y coordinate and the z coordinate. And because we only have two of them, they're linearly independent, which we've shown here, they're only going to span a subspace of three space here. They're only going to span this plane through the origin. So if they're linearly independent, they're going to span, so a linear addition of all of these. So if I have some constant multiple times vector A plus some constant multiple of the second column vector, I can land up anyway in this plane, but I can't get out of this plane. And my problem is y, as you can see orange right up here, that is not in the plane. So my vector y is not in the column space of my matrix A. And that's a problem. Fortunately, we can find this projection of y onto the column space. And we do that so that this is this perpendicular, perpendicular to this plane. There is a point on this plane that goes out to y so that this distance is the shortest possible. This is perpendicular projection down onto the plane. And now you can see we've decomposed y into y parallel, because I'm just using y parallel there because it's parallel to the plane. And then y perpendicular, because this perpendicular component, that is going to be orthogonal to my plane. And that makes it very interesting because any vector that I've got in the plane with a dot product with this y perpendicular should give me zero. They are orthogonal, so any vector in this column, dot product, and you've got to remember that, that becomes very important. What we can solve for if we can't solve for y is at least we can solve for y parallel, because y parallel is in the column space of my matrix A. Some linear combination, some linear addition of these two column vectors, these blue ones will give me y parallel. That's absolutely possible. So I can at least solve for that if I cannot solve for y. Seeing as you can clearly see there, y is not in the column space of A. So what we have here at the bottom is I have my matrix A consisting of these two column vectors multiplied by these coefficients that are in a vector. It's gonna be approximately equal to y. And on the left bottom here, we can see remember y equals y parallel and y perpendicular. That's just vector addition there. And if I just isolate y parallel, that's y minus y perpendicular. So I can go from this approximation to this equality here. I can say eight times beta, that's going to equal y parallel. Some linear combination of these two column vectors can give me y parallel. But remember y parallel is this y minus y perpendicular. Now look what happens if I take a transpose of both sides. So I'm gonna left multiply this equation here, I'm gonna left multiplied by eight transpose. And that's going to have a beautiful effect in as much as if A is not a square matrix, which in our instance it's not going to be. It's gonna be a three by two matrix. If I left multiplied by its transpose, I end up with a square matrix. And that's very nice because if I have a square matrix and those columns are linearly independent, I can take its inverse. And that's going to be very important as well. But let's look what happens. If I take A transpose, left multiply both sides, that means on the right hand side, I have A transpose y minus A transpose y perpendicular. But we just set up here that any column vector with a dot product with y perpendicular should give me a zero. Now if I take A transpose, all the column vectors are now row vectors in A transpose. And I'm taking the dot product of each of them with this perpendicular vector so that all those dot products are gonna be zero, zero, zero. So I'm end up with this zero vector. And if I add a zero vector, I'm doing nothing, I might just throw it away. So in the end, we left just with A transpose y. So A transpose A times beta equals A transpose y. So I can solve for this problem. I can attack this problem and I don't even need the perpendicular component. I don't need the component that's in the plane. I just need y and that's what I was given y. I didn't have to care about those things. It's this magic of taking left multiplying by the transpose. So let's get back to the problem. So we've seen there how we go from this approximation to this equality here. If we only consider y parallel, and here's our full sort of derivation of what's going on. So I've left multiplied here in equation five by A transpose on both sides. But I do remember that A transpose times y perpendicular is gonna give me the zero vector. So that just falls away. And there we see minus the zero vector in the second line. And I'd also do some association here on the left-hand side. I'm just looking at A transpose A. And now we've said that if A transpose, if just A has linearly independent column vectors, and in our instance it's going to be because I have a vector of x and then a vector of x squared. That one is not a linear combination of the other. One is a square of each element of the other. So that's definitely not a linear combination. So I have A transpose A. Remember A is going to be for us a non-square matrix. It's gonna be three by two in our instance. If I left multiply that by A transpose, I'm going to end up with a square matrix that has linearly independent columns. And if I do that, I can take its inverse. That inverse will exist. So if I now multiply by the inverse of A transpose A on the left-hand side and on the right-hand side, I have the inverse of a matrix because A transpose A, that's going to give me a matrix, a result matrix. So the inverse of that times the matrix itself, that just leaves me with the identity matrix. It just falls away. In other words, just remember here, the parentheses I'm using at the end, specifically in the last line that has nothing to do with the equation itself. I'm just explaining what's happening. That we end up here with six, that we have a solution for this vector beta hat that contains my two unknowns. And that is the least squares, ordinary least squares. This equation that you take matrix A transpose times matrix A, you take its inverse multiplied by the transpose of A, multiplied by Y, gives me my two solutions. So if I look at it in this way here on the left-hand side and seven, this is what we have in the moment. My column vector of my X values and then my X squared values. That's my A, that's a three by two matrix. If I multiply that by beta sub zero and beta sub one, it's going to approximately equal that. But I now know if I do the transpose, I'm going to get an exact solution for beta. Of course it's not an exact as far as my final problem is concerned. That's still going to be approximation. And I just want to show you here that this matrix A, it is linearly independent. These columns are linearly independent. If we take the rank of that matrix, remember the rank is going to give me the number of linearly independent columns inside of a matrix. And I'm using some pi's matrix function there, uppercase M. I'm constructing my matrix there and I'm using the rank method there. And if we do that, we see a rank of two. So this matrix A, really those two columns are linearly independent, which is great because that means if we left multiply by its transpose, we're going to get a matrix of which we can take an inverse. Just another way to show you that all three columns are linearly independent. So if I create a matrix of the two column vectors in A and I add to that another column vector, which is Y, which I've done here. And I do row reduced, get it to row reduced echelon form. If we were to do that, we end up with the identity matrix. Remember, that does mean is that it's only the zero vector that is going to give me a solution to the homogeneous problem. In other words, those three columns are linearly independent. And we saw that in the little schematic that we have in Google drawing that they are all linearly independent. That's what I was trying to show there. Once again, that representation wasn't exactly a representation of this problem, but it's only a schematic to show you what's going on. So just want to remind you how we get an equation for that plane to get an equation for plane in three space through the origin. What we need is a point in the plane. And of course, we've got two column vectors here in our plane. So the tip of any of those two vectors can be a point for us in our plane. And we need a normal vector, an orthogonal vector to that plane. And how do you get that? Well, we've got these two vectors in the plane. And if we take their cross product, we're going to end up with a vector that's perpendicular to them. And that vector is going to be perpendicular to the plane. So that's what we can look at there. And this is how we set it up. And we see the uppercase A, Bs and Cs there. Those would be this normal vectors components. And we see the normal vector there. If we take the cross product of our two column vectors in a, we see it there. And if we substitute that to six negative 3.75 and 1.5 in there and we take X sub zero, Y sub zero and Z sub zero to be one of the points, one of the tips of our vectors and we've taken negative 1.5, 1.5, 2 and negative one, we solve for that. And of course, the way that this problem is set up, of course the plane is just going to end up with its coefficients being the components of our normal vector there. And we can just attempt another go at showing that our vector Y is not a solution, it's not in the plane. Because if we plug its points into this equation for the plane, the left hand side and right hand side of eight are not equal to each other. That point is not in that plane. So let's use Python and just do these calculations. So I'm going to set up a matrix A, there we go. And there we go, we can see the two columns there, 1.5, 2, negative one, that's the first column, there's one of our three X coordinates and I'm just squaring them because we want a quadratic approximation. And I'm just setting up my vector Y there. So you've got to see, you don't need to do all these brackets in there, you can just use the reshape method if you want to, setting those values to negative 1.1, but I've just done them individually just to make sure that this is a column vector. And now we're going to use equation six. Remember that it's the transpose of a matrix times the matrix and you take the inverse of that, multiply by the transpose of that matrix times Y and that's going to give us beta, the vector beta. And of course it has to have two values in it, that's a two by one matrix or two element column vector and we see there. So let's just extract those two individually and store them as beta zero and beta one. I'm just using some indexing there just to store those two values. And now I want to plot, finally I want to plot this solution. So what I'm doing here I'm just creating a bunch of X values and I'm using the numpy's a range function there going from negative two to three and steps of 0.05 just so that I had plenty little dots and then what matplotlib is going to do is it's going to make a line from that. And then my solution, remember, which is just a quadratic, I said it's beta sub zero times X plus beta sub one times X squared. So I'm just doing that for all of the values in my array there so that I have X and Y values as little dots and then of course matplotlib is going to create a line from that. And then again with the XKCD what we're going to do is just create this figure. I'm going to put my dots in the figure and I'm also going to plot this red line which we can see there of X and Y values so that we can see both the three points that we had and the quadratic approximation. It's not going to go through all those three points and I've explained to you exactly why that would be. It's not in the column space. And there you can see that would be the best approximation that we can find that of course we did not put a Y intercept so it's going to go through the origin and there are my three points and it is an approximation. So this you can see it's starting to curve so it is going to be a parabola and it is the best fit to these three points. Now if we bump this up and we make this a cubic if we try cubic approximation we actually going to get a cubic solution a cubic equation that is going to go through all three points. Let me show you. Once again in here in equation 10 if we set up this problem we're going to have X plus X squared plus X to the power three so it's going to be a third order polynomial. So that'll make it a cubic equation and it's going to solve but now we have to have three meters of zero, meters of one and meters of two we need three coefficients for this problem. So let me just create this matrix and I'm just going to show you that these things are linearly independent. This column vector X, column vector X squared, column vector X cubed just doing the row reduced echelon form of that using, this is using sum pi if you don't use sum pi's matrix function there you won't have this method available to you it's not a num pi and we see the identity matrix there so those columns are definitely linearly independent. So let's create a num pi matrix of A remember Y's unchanged and I'm just going to use exactly the same ordinary least squares equation, equation six there so the inverse of A transpose times A times A transpose times Y and I get a solution this time it's going to have three elements so three by one vector there and then just saving them individually again just creating my X values and Y values so that I can plot them and now finally we plot this whole solution and you can see with a cubic of course it's going to go through all three points if we had four points we wanted to go through all four points of course we're going to have a fourth order polynomial and this you know might look neat but it's not always what we want what we want most of the time is a linear approximation when we do applications because we don't want to what we call overfit the data because if we create this it's going to fit our data but when we want to use this on unseen data it's not going to it might not do so well so let's do a linear approximation so this is going to be a bit different we know a straight line in two space there we have Y equals MX plus C we have it all the way up there and M is the slope C is the intercept and for the slope we're going to have beta sub one and then for the Y intercept we're going to have beta sub zero and the way that we set it up here is in equation 11 it's a bit different from what we've seen before so there's my vector Y one five negative two all my all those values right on the right hand side we see the vector of X values one point five two negative one and we're going to multiply that by the coefficient beta sub one but we need this beta sub zero as well and what we do to make this vector equation work out we just put a vector of all ones because the scalar times each of these ones is just going to be that beta sub zero beta sub zero beta sub zero so that I can read them off again so the one is going to be the first one there it's going to be approximately equal to beta sub zero times one so it says beta sub zero plus one point five times beta sub one and that's still in approximation because I want to remind you just use slightly different notation in 12 here where I've put a hat on the Y that is in applications where we're just going to get this approximate value of Y we're not going to get this value just showing you a bit of different notation but really that's not what this explanation is about I want you to stick to the original explanation that we had now if we want to create a plane remember it's just a cross product now of these two column vectors one one one one point five to a negative one if we take its cross product as we do there we see that we can create this equation for our plane and I'm going to just to show you that what that plane would look like but then I'm going to create matrix A there now with the one one one in the first column vector and then we're going to do exactly what we did before so there's my solution using equation equation six ordinary least squares I save beta sub zero and beta sub one I just create a bunch of value so that I can plot it and there is the magic we can see the linear approximation of our problem and this is usually an application of what we would go for we want this linear approximation it does not fit the data as well it is just an approximation but it probably will generalize much better to unseen data when you use this linear regression here as the creation of a model so there we go ordinary least squares and how we use it to approximate a model at least for the points in a plane