 So here we are in a Pluton notebook and we're going to use Julia to try and understand ordinary least squares at least as it pertains to creating Solving for quadratic cubic and linear equations giving given some points in the plane So I'm going to assume that you have knowledge of the following Vector operations as you can see there specifically adding vectors vector addition Matrix operations and that includes taking the transverse of a matrix Taking the inverse of a matrix and when you can take the inverse of a matrix when it exists matrix vector multiplication Importantly the column space of a matrix. I'm going to show you a little bit of a schematic just to remind you what that's all about Lineal linear independence if we think about the column vectors that make up a matrix that those column vectors are linearly independent and Then also a projection of a vector onto a column space of a matrix So that's subspace of a matrix how to project a vector onto that subspace So the Julia packages that we're going to use is the plots package and I'm using plotly as my back end as you can see Yeah, they are using the linear algebra package or library and that is part of Julia and then also The row echelon. So if you're going to set up your environment also Add or install the row echelon package So here's our example problem and we can see a nice neat little plotly Graph here of that. So I want to find a quadratic equation later on. We'll do cubic and linear as well Through the origin of two space and the points 1.5 comma 1 There's a little one here the point 2 comma 5 out there and the point negative 1 comma negative 2 So I've got to find this parabola basically That's what a quadratic equation is order to polynomial that goes through these three points in the origin So I'm not gonna have a white intercept my white intercept is going to be zero and you can see the code there in case you're interested And how I generated this neat plot. So let's think about the quadratic approximation First of all, how would you set this problem up and you can see how it's set up there in equation number one because if you think about polynomial quadratic equation That goes through the origin. So we don't have a white intercept. That's just going to be here Y equals a sub one times x so some coefficient of x some plus some coefficient of x squared And what I'm going to do is use these beta sub one and beta sub zero and beta sub one with these hats on And I'm doing that because when we do when we use this building linear models and we're using samples from a population Those would usually be the symbols that we use so I'm just going to use those beta sub zero and beta sub one with a Little hat on this all just to remind that that these are approximations and it is parameters or values that I calculate from a sample and in trying to get the idea of what what's going on in A population So if we write it out like this, I'm going to put the approximate symbol there and you'll see a little bit later Why this is so I'm going to say some vector y equals beta sub zero, that's a constant a coefficient times this vector x plus Beta sub one again This is going to be a coefficient there a scalar times the second vector Which is this going to be the square of each element of the vector and in two it makes intuitive sense What's happening here? I'm saying take my point one point five comma one in the plane I'm saying one is going to be approximately equal to I'm going to get very close to one But I won't get to one and we'll see why if I take my x value and my x squared value and I multiply each of those by the two coefficients and my point two comma five exactly there in my point negative one comma negative two in The third line there of equation two And you can see here in three How we set this up as vectors There's my vector y of my y values It's going to be some constant multiple of my vector of x values plus some constant multiple of my Value squared each value squared and if I write this as a matrix as vector y It's going to equal these two column vectors x and x squared which you see in the third line times this matrix Creating a matrix of that times a vector containing the elements beta sub zero and beta sub one So in the end I'm going to have a very neat way of writing it y my vector y is going to be approximately equal to some matrix times a vector B and Now I just want to show you The schematic of what is going on here So here we are in Just doing a Google drawing here. It's just a schematic of what's going on So it's not the actual vectors in our problem and what we can see we have three space We have our x coming out towards us here our y coordinate and our z going up and We're representing the schematic idea of these two column vectors of a they are linearly Independent of each other and we'll prove in our problem that they are and if they are linearly independent But there's only two of them we cannot possibly spend they can't possibly span all of three space They're gonna span a subspace of three space and that subspace is this plane that goes through the origin So a linear combination of those two column vectors will get me anywhere in the plane But they certainly can't get me to why why is outside of the plane? So I will not be able to find this beta Vector such that if I multiply matrix a With that vector. I'm not going to get out of this plane What I will be able to do is get to this white parallel white parallel is in the plane So let's look at this white parallel and orange it is in the plane So a linear combination of those two column vectors of a will get me to white parallel And how do we set up white parallel? Well, that is a projection of why? orthogonal predicts projection of why onto my plane So this vector white parallel that goes out here. It's parallel and it's gonna be parallel to any vector in the space And that's almost the shortest distance and that's why we get this idea of least squares when we talk about Orderly least squares here. It is this distance here that we're trying to minimize in essence But we're setting this up geometrically. So we don't have to worry about about Explaining it in that form. So this is a perpendicular Vector out there and we've decomposed why such that y equals y parallel that plus y perpendicular And one thing I want you to think about when it comes to why perpendicular Remember why perpendicular as per is orthogonal to any vector here in the space So if I do the dot product of any vector in this column space this plane with y parallel I'm going to get to zero. I'm going to get zero and Think about what happens if I take the transpose of a so every column vector, and I just said a column vector dotted with dot product with this perpendicular vector should give me zero So if I take the transpose of a now each of those column vectors becomes a row vector So I'm taking a row vector times this column vector and that is going to give me zero because I'm dotting two perpendicular vectors with each other and If I do that for all the rows in a transpose which remember actually all the columns in a I'm going to get zero zero zero zero zero and I end up with a zero vector as you can see there So let's look at the bottom here. Remember why I can deconstruct it as y parallel plus y perpendicular And if I just isolate y parallel, that's y minus y perpendicular and Remember, we said we have this approximation a times beta is approximately equal to y I can now make it an equal symbol because I'm just solving for y parallel here And we see why parallel isn't the plane in other words There is going to be a linear combination of these two column vectors in a that will give me y parallel But I'm rewriting y parallel as y minus y perpendicular Now look what happens and this is the beauty of this geometric interpretation of least waves if I take a transpose Left multiply a transpose by both sides of the equation. So it's a transpose a times beta That is going to be I'm just Distributing a transpose into y minus y perpendicular. So it's a transpose y minus a transpose y Perpendicular, but we've already seen that a transpose y perpendicular that just ends up being the zero vector So I can throw it away and in the end this left multiplication by a transpose leaves me on the right hand side With an equality symbol to a transpose y and That's really beautiful because if you think about a transpose a first of all If they if we multiply those two and a is not even a square matrix And our instance is going to be a three by two matrix if I left multiplied by its transpose I end up with a square matrix and if I have a square matrix. I might potentially be able to take its inverse and If those columns of a are linearly independent, which we've seen our problem that it is That a transpose a that square matrix will have linearly independent columns and the inverse will definitely exist And that's going to help us a lot. So let's get back to our problem So when they we see in four we go from this approximation to this equality that a times beta hat That's going to give me y perpendicular, which is this y minus y parallel Which is this y minus y perpendicular and now we can see the approximation So I've left multiplied by a transpose on both sides and then this a transpose y perpendicular falls away. It's just a zero It's just a zero vector and Lo and behold a transpose a if all the column vectors in a are linearly independent a Transpose a is a square matrix of which we can take an inverse and we take the inverse left left multiplied by the inverse on both sides So on the left hand side I have the inverse of something the result of a transpose a times itself Well, that just gives me the identity matrix and the identity matrix times This beta hat we just left to the beta hat there and on the right hand side eventually we get to Our equation for ordinary least squares that beta is going to be a transpose a it's inverse times a transpose times times y Now that's an equality, but that beta remember is this going to eventually be as far as my problem is concerned an approximation So let's see our specific problem. I have a there my two column vectors I have my vector beta and I have my right hand side This is what we're going to end up with it is still going to be an approximation even though I can solve for beta sub zero beta sub one now We've just shown that we cannot get to that vector that vector is outside of the column space Let's just prove that if I take my matrix a 1.5 2 negative 1 and 2.254 1 and I take its rank Remember the rank is going to tell me whether those columns column vectors are linearly independent And we see we get a value of 2 there in other words the rank is 2 those two are linearly independent Another way to show that these three are linearly independent In other words why cannot be in the column space of those two vectors is just to take all three of them together as three as Three column vectors. I'm just adding y to this column vector, and I'm taking the row echelon form Ref from the row echelon package or library and Applying that and I get identity matrix. That means it's only The zeros that solve for the homogeneous system in other words those three are linearly independent And they will span three space in other words that y cannot be in The column space of a Another way to look at it is just to construct this plane An equation for the plane and remember for that We need a point in the plane and the normal vector perpendicular vector How do we get the perpendicular vector? Well, we take the cross product of the two vectors in the plane and we have that So if we take the cross function there of our two vectors We end up with a normal vector and you see an eight how to construct remember the equation of a plane through the origin in three space So a b and c those would be The components of this normal vector and then x sub zero y sub zero z sub zero those would just be a point in the plane And I have two points This is the tips of any of the two vectors in that plane And if we solve for that we get in the way that it's set up It's obviously just going to be a b and c from the normal vector there Those are going to be the components and what we can see if we plug one five negative two in into there It doesn't solve that that point that vector is definitely not in the plane So I hope that makes makes sense and you understand what this problem is all about So let's just use Julia and just solve this problem. I'm going to create a Matrix here called a underscore quadratic and those are my two column vectors in it I'm creating a vector y and you can see it there And now I'm just using equation six to solve for that the inverse as you can see there of the transpose times the matrix itself If I take that inverse and I multiply by the transpose of my matrix and I multiply that with y I'm going to get the solution and it's going to be a two by one matrix in other words a column vector with two elements And it's going to be Very needy for us beta sub zero and beta sub one so what I'm doing in this column here I'm just saving them individually and Then so that I can plot this that we can see here So I'm just creating x valves and y valves quadratic here And I'm using the very first equation. We looked at it is beta sub zero Times the x value plus beta sub one times the x value squared and I'm just creating a lot of points here using a range Object there and I'm just using collect on that so that I get this all this vector of values or a ray of values and Getting all my y value so that when I do plot it it's going to create a line for me of all these x y values So there you can see my three original points and this is going to be a quadratic approximation This is going to be the best fit to these three points They are not going to it's not going to go through the three points Because those three points remember as a vector the y values at least as a vector is not in the column space of a So if we have these three and remember we didn't add a y intercept So it is going to go through the origin now with these three points if I have a cubic equation That's going to be a solution to this problem. It will go through all three points Let me show you in 10 very easily. We now setting up our column vectors as x x squared and x cubed And now we need three coefficients Now I'm just going to show you that x if I take x as a column vector x squared as a column vector x cubed Remember each element is a squared and then cubed. If I do row reduction on that to row reduced echelon form We see we end up with identity matrix and those three are definitely are linearly independent So I'm creating a matrix here called a cubic. You can see it there. My first Column vector is this my x values and then I square each x value and then I cube each x value And then I'm using equation 6 again Just ordinary least squares which gives me three solutions beta sub zero beta sub one and beta sub two Which I save individually there and I'm just doing the exact same thing I didn't have to run x values again because remember we already created it And now I'm just setting up my cubic equation as you can see here As we set up and there we see our solution. We're just plotting with a code down here We're just plotting the three points again And we're plotting this line from all the points that we've created up here And that cubic is going to cubic equations going to go through all the points And again, I have not set up a y intercept. So it's going to go through the origin Now this might look all beautiful And if I have four points, I have to go to a fourth order polynomial, etc That might look beautiful, but it's it's kind of useless in When it gets to applications because what we're doing with least squares One of the applications is to build a model and we want a model to not overfit our data Here's our data and this model clearly overfits because it exactly fits this data And when we try to put unseen data in this model, for instance, if we had x values Our model is going to To calculate what the y should be But it might be way off Because this this method that we've used here It's just memorized as we said the data and it fits it perfectly And usually in practice that leads to overfitting What we would want in a situation like this is just a linear approach a linear approximation And how do we set that up? So I'm just reminding you a straight line equals y equals mx plus c or mx plus b Whatever you had at school m being the slope c being the intercept And the slope m we're just going to set that to beta one And then we're going to have this y intercept beta zero to set that up as a vector in 11 I've got to introduce this column vector of all ones. So if we look at the first point 1.5 comma one That's going to be one That's approximately that's going to equal beta sub zero. Remember, it's just times one. So it's just beta sub zero Plus 1.5 times beta sub one. So that is this equation y equals mx plus c So for each three of those points So setting up this column of one Makes a complete sense And I'm just You know, just a short hand notation of what's happening here And I'm just using slightly different notation just to show you that it exists in essence in the end if I do multiply Eight times and this is x here, but remember that x is going to be not the exit of the x points, but the beta Value that we're looking for the beta values that we're looking for if I do multiply a times the beta values that I find I'm not going to get y exactly. I'm going to get some A predicted y some approximation of y so I'm just showing you some different way to think about What we do with the final solution, but this is not part of the way that that I've explained it here Again, just showing you very quickly if we want to set up an equation for that plane It's the cross product of those two column vectors That's going to give me then the normal vector and the coefficients of those are going to be used in the plane So there's the plane Just to show you and then you can plug in one five two things. It's not a solution. It's certainly not in that plane either So let's set up a linear my matrix there a underscore linear my column vector of all ones my column vector of all my x values I use ordinary least squares equation six and we solve for beta sub zero and beta sub one which we have there And again, I'm just creating a bunch of points here so that I can just plot my linear equation So it's beta sub zero linear and then dot plus because I want element wise addition of the scalar times All my x values and there we go. We see beautiful linear approximation The best fit the line of best fit and if you talk in statistical terms, I'm going to have Minimize my residuals the difference between what the y value is and what the model predicts for any given independent or x variable input So there's my linear approximation. It is just an approximation of the data But it's probably going to fit better to unseen data. We're not overfitting. We're not memorizing the data So these are terms I'm using from Sheen learning I'm using from statistics. I'm just borrowing all these terms just to show you that what we have here is very applicable in in the real world So there we go just using julia to understand ordinary least squares The concepts of that and how to use very simple lines of code to get these solutions