 Welcome back everyone to our lecture series based upon the textbook Matt or linear algebra done Openly as usual. I am your professor Dr. Andrew Missilein for today. It's good to have you all We are in the final chapter of chapter 4 about orthogonality and particularly we're on section 4.8 entitled the least squares problem This this section actually works a really great capstone course a capstone Lecture I should say for much of what we've done this course right here Because up until now and we've been obsessed with this matrix equation a x equals b I might refer to this as the linear problem This is the problem for which nearly everything we've done in linear algebra is revolved around trying to solve this trying to understand it I think we've looked at this problem from like 17,000 different perspectives. That's not much of an exaggeration there and Let me give you a few examples of what we have said with respect to the linear problem Well, one issue we've been interested in is whether it's consistent Does it have a solution or is the solution set empty and we've learned how to solve this? It has a lot to do with when you row reduce the matrix is there a row of zeros That is we're interested in the pivot The pivot rows of the matrix then the zero rows of course corresponded to non pivot rows the the pivot rows have a lot to do with the Both the row space and the left null space as we kind of saw last time With the fundamental theorem of linear algebra It also has a lot to do with whether the Whether the columns of the matrix span the vector space fn If we can span then we can hit everything so there's no choice to be they'll make it inconsistent Or if we think of it as a problem about linear transformations The this system being consistent has a lot to do with whether the map is surjective or what we call on to Now when it is consistent When is the solution unique? Well, this had a lot to do with the variables Do we have free variables or in our free variables are dependent variables? This had to do with the pivot columns and the pivot the pivot columns and the non pivot columns This then related to the idea of linear independence when the column vectors were linear independent This had to do with independence are unique solutions and if we think of this as a linear system that's linear system instead as a Linear transformation type problem. We're talking about whether the map is injective or if it's one to one So this just kind of gives you a few of those 17,000 perspectives. I was talking about before and when the solution Exists we can orthogonally decompose this vector x the solution x into two pieces There is x null This is something that lives inside the null space and remember the null space is the solution of the homogeneous system a x equals zero and we also can there's this other part x row which x row will live inside the row space of a and This particular solution will represent the shortest solution in the solution set the closest one to zero and so Again, this just comes perspective of what we've seen before we we understand this linear problem a lot and we've talked about a lot You know, we've talked we've analyzed this thing to death I think but one thing we really haven't talked about at all is What happens when the system is? Inconsistent hmm think about that for a second let it sink in We know how to recognize when a linear system is consistent and therefore we know how to Recognize when it's inconsistent, but when it's inconsistent. What do we do? Do we just give up like game over Mario? Your princess is another castle. I mean, what do we do? Maybe when you have such you have sort of a sad ending There's no solution we get angry when we get angry we play to play the blame game So who's at fault if it's inconsistent, who should I blame? Well, most of the blame should be placed on this vector B right here a little bit can be placed on a and I'll say what I mean About that right now the vector B Which lives in FM can also be orthogonally decomposed into two vectors. There's the vector B left null which lives inside the the nulls the left null space of a and then there's this other vector B Column which lives inside the column space of a And remember the the left null space represented those vectors which absolutely are going to be inconsistent Necessarily, they're going to be inconsistent. You can't get them to work And so in fact our system of equations will be inconsistent If and only if this B left null part is non-zero and So the inconsistency is to be blamed on B who has a left null part now This is the left null space of a so again in a does hold some of the blame But we just chose the wrong vector. We chose the wrong vector B in terms of Solving a system of equations like if we just picked a random vector Is it a solution or is it not a solution? We picked a random vector X that depends on the null space because the bigger the null space is that is the larger the nullity The better chance we could randomly select a vector That's a solution to the system on the other hand the left null space kind of represents How what probability we could have in finding a consistent system? That is if the vector B is chosen Randomly inside of fm the size of the left null space the co-nullity measures how all the vectors We should be avoiding and everything outside the left null space would give us something that's consistent So when your system is inconsistent The question we want to ask is no longer. Is there an X so that x equals B because we know there is none There's none. There's no vector that does that So in this situation we change the question instead of asking is there a vector? So AX equals B we ask is there a vector so that AX is approximately B And what do you mean by approximate when we talk about this all the time like Pi is approximately 3.14. But what does that mean? Approximation means that you are close You're close that is this is a statement about distance And for analytic geometry our distance formula is typically of the form you take x1 minus x2 squared plus y1 minus y2 squared and this all sits inside the square root The norm that we we use for vector spaces here is a generalization of this formula Norms and inner products measure this distance And so we have to use a vector distance to measure how close can AX B to B And so then our definition the main definition for this section here If a matrix is m by n and we have a vector B and fm a least square solution To the linear problem AX equals B is a vector x hat inside of fn such that The length of the vector B minus a hat is less than or equal to B minus A x for all vectors x inside of fn So x hat is chosen so that AX hat is closest so AX hat is nearest It's nearest to x hat there And so this finding this vector x hat is what's known as the least squares problem And the and finding the least squares problem is what we want to try to find a solution to the least squares problems What we're going to try to do right here And so the least squares name comes from the fact that least we're trying to minimize distance and the distance form of all sums of squares this distance formula we have right here is when We have two dimensions if we had like three dimensions, there would be three squares if it's 17 dimensions We'd have 17 squares trying to minimize these squares here the least squares problem So I want to mention to you that the solution to the least squares problem can coincide with the solution to the linear problem If the linear equation AX equals B is consistent, then it has a solution Let's say that x hat is a solution. That's just the name of the solution Well, since that x hat is a solution that means AX hat equals B And so if AX hat equals B, we can subtract AX hat from both sides of the equation It cancels on the left and we get that B minus AX hat is equal to zero And so if B minus AX hat is equal to zero Then the length of that vector is the length of the zero vector, which is zero and you can't get any closer distance than zero It's like the little brother staring in the face of his little of his bigger sister there You can't get any closer than zero distance And so if the linear problem has a solution the least squares solution will have that same solution So when you're when your linear problem is consistent, the least square solution gives you the linear solution But what's useful for us Is that when the least square or when the when the linear problem is inconsistent? The least squares problem can still provide us a Unapproximate solution and so what I mean by is that the following take a generic vector x inside of fn So this is kind of like a variable. We have some variable here. It's just generic We have no specification on it. Well, if x is a generic vector for fn AX will be a generic vector for the column space because after all the column space will be a generic vector After all the column space is just the set of all vectors ax where x is a vector for fn If x is allowed to vary ax will vary over all the possibilities for the column space Let b hat denote the orthogonal projection of b onto the column space of a Which this is the same thing as just the vector b column of a so with this orthogonal decomposition We did before b column plus b left null The orthogonal projection onto the column space is just what we meant by b column Now using the best approximation theorem, which we talked about earlier in this chapter as a reminder what that is if you have a flat And you have a point that's off of that flat the shorter that the shortest path From the points to the flat is going to be the perpendicular path the orthogonal path If we chose any other path, it'll be longer And so the distance between b and b B hat is going to be shorter than any other vector in the column space Which ax is a generic vector in that column space But wait a second if b is in the column space of a that means we can write it as A times something we can factor it So there's got to be some vector x hat so that ax hat equals b And so we could rewrite this as the length of b minus a hat And so this gives us a least square solution Telling us that the least squares problem is always always always always consistent So even if the linear problem is inconsistent the least squares problem will always have a solution So when it is consistent the least squares solution is the same solutions as the linear problem And when it's inconsistent it gives us a solution we didn't have before So you can start to see the power of the least squares problem why we want to solve it Well, okay, that tells us why we want to solve it, but how does one actually solve it? How do you find x hat? There it is. Well, that doesn't count. We have to compute it somehow Well, going back to what we were talking about before b minus ax hat This equals b minus b hat where b is hat is the orthogonal projection of b onto the column space And so like we saw before b is equal to b column But plus b left null and so when you subtract them b minus b hat you're going to end up with b left null And so b minus b hat will live inside the orthogonal complement of the column space Which is none other than the column of the left null space that is b minus ax hat is inside the left null space And remember the left null space is the set of all vectors of the form y such that a transpose y equals zero And this is because the left null space is just the null space of a transpose Now I should mention that if you're solving the least squares problem over a complex matrix or complex system of linear equations You have to replace all these transposals with conjugate transposals as usual So b minus ax hat is inside the left null space. So when you times it by a transpose, you have to get zero Distribute the a transpose here and then move it move one to the other side. You're going to get that a transpose a a transpose ax hat equals a transpose b And so this equation right here this matrix equations can be pretty important If you have the linear problem ax equals b you can compute what's called the Normal equations a transpose ax equals a transpose b and a solution To the normal equations gives us a least square solution to the linear The linear equations and why we call them normal equations is because well, even though that you see a single Matrix equation. I don't see any other equations. Well, that's just because one matrix equation represents a system of linear Equations, so there's more than one there and we normalize by multiplying on both sides of the equation A transpose you take the original linear problem Multiply on both sides a transpose and that itself is not a hard thing to do But what's impressive is the theory we've developed in this chapter shows us That if you take the if you take the linear equation ax equals b and you multiply both sides on the left by a transpose The solution to that normal equation those normal equations will be the least square solutions. It's really impressive. It's a very easy thing to do